action #107875
Updated by livdywan almost 3 years ago
## Observation We've got the alert [again](https://progress.opensuse.org/issues/107257) on March 3, 2022 09:00:40: ``` [Alerting] Apache Response Time alert The apache response time exceeded the alert threshold. * Check the load of the web UI host * Consider restarting the openQA web UI service and/or apache Also see https://progress.opensuse.org/issues/73633 Metric name Value Min 18733128.83 ``` Relevant panel: https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=84 --- Tina wrote in chat > if anyone was wondering about the short high load on osd, I fetched /api/v1/jobs and it took 10 minutes but that was already on Wednesday so it shouldn't have been caused this. Further data points - High CPU likely didn't affect scheduling, or we should've had other reports of it - High CPU wouldn't cause a spike in failures in jobs? ## Suggestions * The apache log parsing seems to be quite heavy. Can we reduce the amount of data parsed by telegraf * Reduce interval we take new data points in telegraf * Extend alerting measurement period from 5m to 30m (or higher) to smooth out gaps