action #97136
Updated by okurz over 3 years ago
## Observation
There is at least one broken worker for more than 15 minutes. Have a look at
https://openqa.suse.de/admin/workers to find out which one it is (click the help
icon to view the concrete error message).
Metric name
Value
Number of broken workers
15.000
View your Alert rule
<http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=96&orgId=1>
## Suggestions
* Check the yet-another-alert from the start of the week when after weekly rebooting there were reports about broken workers
* Understand https://github.com/os-autoinst/openQA/pull/4122 which is likely to be related
* Look into worker logs
* Prevent the situation that workers are reported as broken too soon