Project

General

Profile

Actions

action #78061

closed

[Alerting] openQA minion workers alert - alert turned "OK" again after 20 minutes and we don't know what was wrong

Added by okurz about 4 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2020-11-16
Due date:
% Done:

0%

Estimated time:

Description

Observation

[Alerting] openQA minion workers alert

From: Grafana osd-admins@suse.de
To: osd-admins@suse.de
Sender: osd-admins
List-Id:
Date: 16/11/2020 21.25

/[Alerting] openQA minion workers alert/

Minion workers down. Check systemd services on the openQA host

Metric name
Value
Sum
0.940

but checking on "the openQA host", I guess that means openqa.suse.de, shows no failed systemd services. What do "minion workers down" have to do with systemd services? Should that be only the service "openqa-gru.service" on osd?

Acceptance criteria

  • AC1: The grafana panel description and/or alert has a better explanation of what is going on and what should be checked

Suggestions


Related issues 1 (0 open1 closed)

Is duplicate of openQA Infrastructure - action #80538: flaky and misleading alerts about "openQA minion workers alert" as well as "Minion Jobs alert"Resolvedmkittler2020-11-272021-04-14

Actions
Actions

Also available in: Atom PDF