Project

General

Profile

Actions

action #95983

closed

alert about "minion workers", alert triggered two times and turned green again

Added by okurz over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2021-07-27
Due date:
2021-09-02
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Observation

Received two monitoring alerts which are also seen on
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=17&orgId=1&from=1627142086249&to=1627190672223 showing two small ditches.

Acceptance criteria

  • AC1: grafana does not show <1 total minion workers if active == 1
  • AC2: the grafana dashboard has a description of what this all means
  • AC3: No alerts about "minion workers" for a week

Problem

It's likely better if we do not count number of minions as a float number. We should investigate why a minion worker would not be available for nearly 8 minutes but also we should design the alert to be resilient if a minion worker is offline for a limited time when it's not impacting operations further. Also what should be prevented is that we show something like "0.94" total minion workers (active+inactive) when at that time active is actually 1


Subtasks 2 (0 open2 closed)

action #96089: alert about "minion workers" - make meaning of grafana panel clear size:SResolvedmkittler2021-07-27

Actions
action #96380: "minion workers" alert shows <1 total minion workers if active == 1 size:MResolvedmkittler2021-09-02

Actions
Actions

Also available in: Atom PDF