action #107257
closed[alert][osd] Apache Response Time alert size:M
0%
Description
Observation¶
From grafana: [Alerting] Apache Response Time alert
The apache response time exceeded the alert threshold. * Check the load of the web UI host * Consider restarting the openQA web UI service and/or apache Also see https://progress.opensuse.org/issues/73633
Metric name
Value
Min
2565671.000
view alert rule: http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=84&orgId=1
Reproducible¶
Multiple alerts since at least 2022-02-22, likely even the past days.
Suggestions¶
- okurz already restarted the apache service because it was running for longer than the time since the labs was moved. But since then we had multiple other alerts
- Likely the problem is not apache itself but either the network is problematic or our openQA service
- It seems we are smoothing over not that long time so maybe we don't have enough data due to the data outages. So we should look into #107437 first
- Look back how it looks after #107437 is resolved
- Optional: Reconsider how we alert on response times when we actually do not have that many responses
Rollback steps¶
- okurz paused the alert for https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&editPanel=84&tab=alert , unpause if everything is good again
Updated by okurz almost 3 years ago
- Priority changed from High to Urgent
recurring a lot over the day
Updated by livdywan almost 3 years ago
okurz wrote:
recurring a lot over the day
I can confirm. And in general we get a lot of messages. I suggest to find out if there's a correlation, or identify another ticket during estimation since I've reached alert fatigue.
Updated by okurz almost 3 years ago
- Related to action #107437: [alert] Recurring "no data" alerts with only few minutes of outages since SUSE Nbg QA labs move size:M added
Updated by okurz almost 3 years ago
- Related to action #102650: Organize labs move to new building and SRV2 size:M added
Updated by okurz almost 3 years ago
- Subject changed from [alert][osd] Apache Response Time alert to [alert][osd] Apache Response Time alert size:M
- Description updated (diff)
- Status changed from New to Blocked
- Assignee set to okurz
Updated by okurz almost 3 years ago
- Status changed from Blocked to Resolved
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=84&orgId=1&from=now-12h&to=now looks good again. Alert unpaused. #107437 resolved. We can resolve here as well as there is nothing else showing up.
Updated by okurz almost 3 years ago
- Related to action #107875: [alert][osd] Apache Response Time alert size:M added