action #162602
Updated by okurz 6 months ago
## Observation
With #162374 w40 (worker40.oqa.prg2.suse.org) is the only OSD PRG2 x86_64 tap worker and due to the openQA job queue size w40 is executing openQA jobs near-continuously. Now an alert triggered about too high CPU load and one about a partition getting full. Similar to #162596
## Suggestions
* Maybe the high CPU load was caused by the lack of space - which is tracked in #162596
* Are tests passing successfully on worker40? - If it doesn't look like we have typing or similar issues, bump the alert threshold.
* Lower the load limit
* Check the number of worker slots and e.g. reduce according to the load - maybe we didn't notice the capacity was already too high before
* Take #162596 into account
## Rollback actions
* Remove alert `rule_uid=~load_alert_worker40` from https://monitor.qa.suse.de/alerting/silences
Back