Actions
action #163394
openConsider extending our logging of broken workers in grafana (Better understand "Broken workers alert" retroactively)
Start date:
2024-07-05
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
The "Broken workers alert" (https://stats.openqa-monitor.qa.suse.de/alerting/grafana/dZ025mf4z/view?orgId=1) is hard to understand retroactively because we only collect the total amount of broken workers and not their names. https://openqa.suse.de/admin/workers only has the current status of a worker and no history.
Suggestions¶
- Consider extending our metrics to also collect the worker(instance) with its state. This allows us to further understand past issues.
- Include these information into alert messages if possible. Try to not introduce new alerts.
Actions