coordination #96447
Updated by mkittler over 2 years ago
## Observation - ~~Alerts Alerts `Disk I/O time for /dev/vdd (/results)`~~ handled in #96554 (/results)` - ~~Alerts for `Job age (scheduled)`~~ - Alerts for `Failed systemd services` - This is about the alert from 02.08.21 01:17 (the one from 05.08.21 07:26 was caused by a user's misconfiguration). ## Suggestion - Bump our thresholds - Investigate if our average load has increased immensely e.g. new test groups being scheduled - Look at systemd journal while the alert is running (short of having #96551) - Check if we have data on reduced heat/ power in server room 2 - ~~`Job age (scheduled) (median)` is likely due to issues with the `WORKER_CLASS` of https://openqa.suse.de/tests/6513484~~