action #95983
closedalert about "minion workers", alert triggered two times and turned green again
100%
Description
Observation¶
Received two monitoring alerts which are also seen on
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=17&orgId=1&from=1627142086249&to=1627190672223 showing two small ditches.
Acceptance criteria¶
- AC1: grafana does not show <1 total minion workers if active == 1
- AC2: the grafana dashboard has a description of what this all means
- AC3: No alerts about "minion workers" for a week
Problem¶
It's likely better if we do not count number of minions as a float number. We should investigate why a minion worker would not be available for nearly 8 minutes but also we should design the alert to be resilient if a minion worker is offline for a limited time when it's not impacting operations further. Also what should be prevented is that we show something like "0.94" total minion workers (active+inactive) when at that time active is actually 1
Updated by okurz about 3 years ago
- Subject changed from alert about "minion workers" to alert about "minion workers", alert triggered two times and turned green again
- Description updated (diff)
Updated by okurz about 3 years ago
- Status changed from New to Blocked
- Assignee set to okurz
Updated by okurz about 3 years ago
- Status changed from Blocked to Resolved
Checked https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=17&orgId=1&from=now-7d&to=now, all three ACs fulfilled now