Actions
action #107257
closed[alert][osd] Apache Response Time alert size:M
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2022-02-22
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
From grafana: [Alerting] Apache Response Time alert
The apache response time exceeded the alert threshold. * Check the load of the web UI host * Consider restarting the openQA web UI service and/or apache Also see https://progress.opensuse.org/issues/73633
Metric name
Value
Min
2565671.000
view alert rule: http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=84&orgId=1
Reproducible¶
Multiple alerts since at least 2022-02-22, likely even the past days.
Suggestions¶
- okurz already restarted the apache service because it was running for longer than the time since the labs was moved. But since then we had multiple other alerts
- Likely the problem is not apache itself but either the network is problematic or our openQA service
- It seems we are smoothing over not that long time so maybe we don't have enough data due to the data outages. So we should look into #107437 first
- Look back how it looks after #107437 is resolved
- Optional: Reconsider how we alert on response times when we actually do not have that many responses
Rollback steps¶
- okurz paused the alert for https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&editPanel=84&tab=alert , unpause if everything is good again
Actions