Actions
action #179497
closed[FIRING:1] worker-arm1 (worker-arm1: System load alert openQA worker-arm1 salt system_load_alert_worker-arm1 worker)
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
The load was exceeding our expected limits for 15 minutes, and triggered an according alert. It started around ~20:30 UTC, peaked at 21:00 and seems to normalize around 21:30 again
Acceptance Criteria¶
- AC1: No alerts about high load for normal openQA workloads on worker-arm1
Suggestions¶
- Look for cues on what caused the high load at the time
- Consider https://progress.opensuse.org/issues/164284
Files
Updated by nicksinger 8 days ago
- Copied from action #164284: [FIRING:1] worker-arm1 (worker-arm1: System load alert openQA worker-arm1 salt system_load_alert_worker-arm1 worker) size:S added
Updated by nicksinger 8 days ago
- Status changed from New to Feedback
- Assignee set to nicksinger
Updated by nicksinger 7 days ago
- File clipboard-202503271035-h2fvr.png clipboard-202503271035-h2fvr.png added
- Status changed from Feedback to Resolved
Changes are effective and several slots are currently paused with a message telling me that the configured threshold of 14 is in place. Load still peaks but not as high as previously:
I removed the alert silence for that host again.
Actions