Project

General

Profile

action #78061

[Alerting] openQA minion workers alert - alert turned "OK" again after 20 minutes and we don't know what was wrong

Added by okurz 8 months ago. Updated 5 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
Target version:
Start date:
2020-11-16
Due date:
% Done:

0%

Estimated time:

Description

Observation

[Alerting] openQA minion workers alert

From: Grafana osd-admins@suse.de
To: osd-admins@suse.de
Sender: osd-admins
List-Id:
Date: 16/11/2020 21.25

/[Alerting] openQA minion workers alert/

Minion workers down. Check systemd services on the openQA host

Metric name
Value
Sum
0.940

but checking on "the openQA host", I guess that means openqa.suse.de, shows no failed systemd services. What do "minion workers down" have to do with systemd services? Should that be only the service "openqa-gru.service" on osd?

Acceptance criteria

  • AC1: The grafana panel description and/or alert has a better explanation of what is going on and what should be checked

Suggestions


Related issues

Is duplicate of openQA Infrastructure - action #80538: flaky and misleading alerts about "openQA minion workers alert" as well as "Minion Jobs alert"Resolved2020-11-272021-04-14

History

#1 Updated by okurz 5 months ago

  • Is duplicate of action #80538: flaky and misleading alerts about "openQA minion workers alert" as well as "Minion Jobs alert" added

#2 Updated by okurz 5 months ago

  • Status changed from Workable to Rejected
  • Assignee set to okurz

Also available in: Atom PDF