action #152386
closed[alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 size:S
0%
Description
Observation¶
The "Apache Workers" and "HTTP Response" alerts fired on 2023-12-11 00:48:00 +0100 CET and 2023-12-11 00:49:00 +0100 CET respectively. The alerts were resolved 5 minutes later. So the impact is likely not very high (if there was an impact at all).
Suggestions¶
- Check impact (so far there doesn't seem to be any), maybe have a look at jobs that were executed around that time
- Adjust the alert thresholds? We rarely seem to hit that limit, though?
- Check grafana state and systemd journal from the relevant time
- Seems to have been a one-off so far
Out of scope¶
- Moving to nginx
Updated by livdywan about 1 year ago
- Subject changed from [alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 to [alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 1 year ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Updated by okurz about 1 year ago
- Status changed from In Progress to Resolved
In https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=15&orgId=1&from=1701864914643&to=1702894968698 I could only see a small dip once and not again. https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=15&orgId=1&from=1702243017076&to=1702261824564 shows a detail of the incident.
In https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1702243017076&to=1702261824564 I could see that there was high CPU load, low CPU usage and near-zero I/O bandwidth on all monitored storage devices. So I suspect that the external hypervisor blocked the VM which apparently does not happen often. I see no necessary action here right now.