Project

General

Profile

action #162602

Updated by okurz 6 months ago

## Observation 
 With #162374 w40 (worker40.oqa.prg2.suse.org) is the only OSD PRG2 x86_64 tap worker and due to the openQA job queue size w40 is executing openQA jobs near-continuously. Now an alert triggered about too high CPU load and one about a partition getting full. Similar to #162596 

 ## Suggestions 
 * Maybe the high CPU load was caused by the lack of space - which is tracked in #162596 
 * Are tests passing successfully on worker40? - If it doesn't look like we have typing or similar issues, bump the alert threshold. 
 * Lower the load limit 
 * Check the number of worker slots and e.g. reduce according to the load - maybe we didn't notice the capacity was already too high before 
 * Take #162596 into account 

 ## Rollback actions 
 * Remove alert `rule_uid=~load_alert_worker40` from https://monitor.qa.suse.de/alerting/silences

Back