Project

General

Profile

action #133889

Updated by tinita over 1 year ago

## Observation 

 ``` 
 Firing [stats.openqa-monitor.qa.suse.de] 
 web UI: Minion jobs failed hook alert 
 View alert [stats.openqa-monitor.qa.suse.de] 
 Values 
 A0=21  
 Labels 
 alertname 
 web UI: Minion jobs failed hook alert 
 grafana_folder 
 Salt 
 rule_uid 
 minion_jobs_failed_hook_alert 
 Annotations 
 message 
 Too many minion jobs with failed hook scripts. 
 ``` 
 https://stats.openqa-monitor.qa.suse.de/alerting/grafana/e06a9f3f-205f-4733-b63b-4a84dfea1535/view?orgId=1 
 https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=201&orgId=1&from=1691201258115&to=1691238718573 

 According to the grafana panel there were 21 failures between 08:00 and 08:10, however I don't see matching entries in the gru journal around that time. `journalctl    -u openqa-gru --since "2023-08-05 00:00:00"` 
 There are much more failures around 06:00. 

 Also looking at other occasions of failures (under the alarm threshold) the time in grafana doesn't fit the journal. 
 The journal timestamps are in CEST. Am I missing something?

Back