Project

General

Profile

Actions

action #163394

open

Consider extending our logging of broken workers in grafana (Better understand "Broken workers alert" retroactively)

Added by nicksinger about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2024-07-05
Due date:
% Done:

0%

Estimated time:

Description

Observation

The "Broken workers alert" (https://stats.openqa-monitor.qa.suse.de/alerting/grafana/dZ025mf4z/view?orgId=1) is hard to understand retroactively because we only collect the total amount of broken workers and not their names. https://openqa.suse.de/admin/workers only has the current status of a worker and no history.

Suggestions

  • Consider extending our metrics to also collect the worker(instance) with its state. This allows us to further understand past issues.
  • Include these information into alert messages if possible. Try to not introduce new alerts.
Actions

Also available in: Atom PDF