Project

General

Profile

coordination #96447

Updated by mkittler over 2 years ago

## Observation 

 - ~~Alerts Alerts `Disk I/O time for /dev/vdd (/results)`~~ handled in #96554 (/results)` 
 - ~~Alerts for `Job age (scheduled)`~~ 
 - Alerts for `Failed systemd services` 
   - This is about the alert from 02.08.21 01:17 (the one from 05.08.21 07:26 was caused by a user's misconfiguration). 

 ## Suggestion 
 - Bump our thresholds 
 - Investigate if our average load has increased immensely e.g. new test groups being scheduled 
 - Look at systemd journal while the alert is running (short of having #96551) 
 - Check if we have data on reduced heat/ power in server room 2 
 - ~~`Job age (scheduled) (median)` is likely due to issues with the `WORKER_CLASS` of https://openqa.suse.de/tests/6513484~~

Back