Actions
action #97136
closed[alert] multiple unhandled alerts about "broken workers" size:M
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2021-08-18
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
There is at least one broken worker for more than 15 minutes. Have a look at
https://openqa.suse.de/admin/workers to find out which one it is (click the help
icon to view the concrete error message).
Metric name
Value
Number of broken workers
15.000
View your Alert rule
http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=96&orgId=1
Suggestions¶
- Check the yet-another-alert from the start of the week when after weekly rebooting there were reports about broken workers
- Understand https://github.com/os-autoinst/openQA/pull/4122 which is likely to be related
- Look into worker logs
- Prevent the situation that workers are reported as broken too soon
Actions