[Alerting] openQA minion workers alert - alert turned "OK" again after 20 minutes and we don't know what was wrong
[Alerting] openQA minion workers alert
/[Alerting] openQA minion workers alert/
Minion workers down. Check systemd services on the openQA host
but checking on "the openQA host", I guess that means openqa.suse.de, shows no failed systemd services. What do "minion workers down" have to do with systemd services? Should that be only the service "openqa-gru.service" on osd?
- AC1: The grafana panel description and/or alert has a better explanation of what is going on and what should be checked
- Understand the data source for https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=query&editPanel=17&orgId=1&refresh=30s
- Extend the grafana panel and/or alert description to include a better description and instructions what to do specifically, e.g. also what log of what service we should into in case that we can not see anything wrong at the time of checking because something "resolved itself" already