Project

General

Profile

action #94237

Updated by okurz almost 3 years ago

## Observation 
 https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?tab=alert&editPanel=9&viewPanel=9&orgId=1&from=1623621600000&to=1623967199000 shows that for a longer time there had been more than 3k scheduled tests but grafana does not show that it would have sent alert messages. 

 ## Acceptance criteria 
 * **AC1:** Alert messages are sent when scheduled jobs exceed a defined alert threshold 

 ## Suggestions 
 * Investigate why there was this sudden surge in nearly 9k blocked jobs on 2021-06-14. Was that because openQABot was offline in the days before that? -> yes, we assume that was the case 
 * Investigate if there was an alert message which maybe just does not show in grafana but was still sent -> according to http://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.06/maillist.html there was no such email received by osd-admins@suse.de 
 * Crosscheck system logs on the grafana instance 
 * Ensure that alert messages are sent

Back