Project

General

Profile

Actions

action #97136

closed

[alert] multiple unhandled alerts about "broken workers" size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2021-08-18
Due date:
% Done:

0%

Estimated time:

Description

Observation

There is at least one broken worker for more than 15 minutes. Have a look at
https://openqa.suse.de/admin/workers to find out which one it is (click the help
icon to view the concrete error message).

      Metric name




      Value


      Number of broken workers




      15.000

View your Alert rule
http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=96&orgId=1

Suggestions

  • Check the yet-another-alert from the start of the week when after weekly rebooting there were reports about broken workers
  • Understand https://github.com/os-autoinst/openQA/pull/4122 which is likely to be related
  • Look into worker logs
  • Prevent the situation that workers are reported as broken too soon

Related issues 1 (0 open1 closed)

Copied to openQA Infrastructure (public) - action #97139: [alert] multiple unhandled alerts about "malbec: Memory usage alert" size:MResolvedmkittler2021-08-182021-09-09

Actions
Actions

Also available in: Atom PDF