Project

General

Profile

action #68077

alert about too many failed minion jobs but https://openqa.suse.de/minion/jobs?state=failed shows none

Added by okurz 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
2020-06-15
Due date:
% Done:

0%

Estimated time:

Description

Observation

I see the alert but https://openqa.suse.de/minion/jobs?state=failed shows 0
failed.
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?
orgId=1&refresh=30s&fullscreen&panelId=19&from=now-1h&to=now
shows the numbers jumping from a high value to 0 every minute.
Received an alert email notification 2020-06-14 04:46 "Too many failed Minion jobs", Value Failed 26.797

History

#1 Updated by okurz 7 months ago

  • Target version set to Ready

#2 Updated by okurz 7 months ago

  • Status changed from New to Resolved
  • Assignee set to okurz

somehow we overlooked it. With the help of the team we looked over the issues and found out that the difference is that the workers also publish their minion job status. All the data was intermingled in grafana. Fixed in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/315

Also available in: Atom PDF