Project

General

Profile

Actions

action #133889

closed

[alert] Minion jobs failed hook alert

Added by tinita 10 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-08-07
Due date:
% Done:

0%

Estimated time:

Description

Observation

Firing [stats.openqa-monitor.qa.suse.de]
web UI: Minion jobs failed hook alert
View alert [stats.openqa-monitor.qa.suse.de]
Values
A0=21 
Labels
alertname
web UI: Minion jobs failed hook alert
grafana_folder
Salt
rule_uid
minion_jobs_failed_hook_alert
Annotations
message
Too many minion jobs with failed hook scripts.

https://stats.openqa-monitor.qa.suse.de/alerting/grafana/e06a9f3f-205f-4733-b63b-4a84dfea1535/view?orgId=1
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=201&orgId=1&from=1691201258115&to=1691238718573

According to the grafana panel there were 21 failures between 08:00 and 08:10, however I don't see matching entries in the gru journal around that time. journalctl -u openqa-gru --since "2023-08-05 00:00:00"
There are much more failures around 06:00.

Also looking at other occasions of failures (under the alarm threshold) the time in grafana doesn't fit the journal.
The journal timestamps are in CEST. Am I missing something?


Files

Actions

Also available in: Atom PDF