Project

General

Profile

Actions

action #138044

closed

Grouped seemingly unrelated alert emails are confusing size:M

Added by livdywan about 1 year ago. Updated 11 months ago.

Status:
Rejected
Priority:
Low
Assignee:
Category:
-
Start date:
2023-10-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

The email subject says FIRING:2, and it points to multiple seemingly unrelated alerts including

  • Packet loss between worker hosts and other hosts alert
  • Failed systemd services alert (except openqa.suse.de)

which leads to confusion as to what's being alerted about.

Acceptance criteria

  • AC1: alert email subject lines allow to find out related alerts
  • AC2: We are still not receiving "too many" (subjective) emails for related alerts

Suggestions

  • Clarify why these alerts are being grouped
  • Find a way to give the email a useful subject
  • Group/ungroup alerts based on clear patterns
  • If there is no other way maybe ungroup again completely in coordination with the team

Related issues 5 (1 open4 closed)

Related to openQA Infrastructure (public) - action #137600: [alert] Packet loss between worker hosts and other hosts size:SResolvedokurz2023-10-09

Actions
Related to openQA Infrastructure (public) - action #122848: Configure grouped alerts in Grafana correctly size:MResolvedokurz2023-01-09

Actions
Related to openQA Infrastructure (public) - action #133130: Lots of alerts for a single cause. Can we group and de-duplicate?Resolvednicksinger2023-07-20

Actions
Related to openQA Infrastructure (public) - action #159639: [alert] "web UI: Too many 5xx HTTP responses alert" size:SResolveddheidler2024-04-26

Actions
Related to openQA Infrastructure (public) - action #159657: [alert] about "web UI: Too many 5xx HTTP responses alert" hidden behind grafana alert groupingNew2024-04-26

Actions
Actions

Also available in: Atom PDF