Project

General

Profile

Actions

action #138044

closed

Grouped seemingly unrelated alert emails are confusing size:M

Added by livdywan 8 months ago. Updated 5 months ago.

Status:
Rejected
Priority:
Low
Assignee:
Category:
-
Target version:
Start date:
2023-10-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

The email subject says FIRING:2, and it points to multiple seemingly unrelated alerts including

  • Packet loss between worker hosts and other hosts alert
  • Failed systemd services alert (except openqa.suse.de)

which leads to confusion as to what's being alerted about.

Acceptance criteria

  • AC1: alert email subject lines allow to find out related alerts
  • AC2: We are still not receiving "too many" (subjective) emails for related alerts

Suggestions

  • Clarify why these alerts are being grouped
  • Find a way to give the email a useful subject
  • Group/ungroup alerts based on clear patterns
  • If there is no other way maybe ungroup again completely in coordination with the team

Related issues 5 (1 open4 closed)

Related to openQA Infrastructure - action #137600: [alert] Packet loss between worker hosts and other hosts size:SResolvedokurz2023-10-09

Actions
Related to openQA Infrastructure - action #122848: Configure grouped alerts in Grafana correctly size:MResolvedokurz2023-01-09

Actions
Related to openQA Infrastructure - action #133130: Lots of alerts for a single cause. Can we group and de-duplicate?Resolvednicksinger2023-07-20

Actions
Related to openQA Infrastructure - action #159639: [alert] "web UI: Too many 5xx HTTP responses alert" size:SResolveddheidler2024-04-26

Actions
Related to openQA Infrastructure - action #159657: [alert] about "web UI: Too many 5xx HTTP responses alert" hidden behind grafana alert groupingNew2024-04-26

Actions
Actions #1

Updated by livdywan 8 months ago

  • Related to action #137600: [alert] Packet loss between worker hosts and other hosts size:S added
Actions #3

Updated by tinita 8 months ago

  • Related to action #122848: Configure grouped alerts in Grafana correctly size:M added
Actions #4

Updated by okurz 8 months ago

  • Related to action #137270: [FIRING:1] host_up (malbec: host up alert openQA malbec host_up_alert_malbec worker) and similar about malbec added
Actions #5

Updated by okurz 8 months ago

  • Related to deleted (action #137270: [FIRING:1] host_up (malbec: host up alert openQA malbec host_up_alert_malbec worker) and similar about malbec)
Actions #6

Updated by okurz 8 months ago

  • Subject changed from Grouped seemingly unrelated alert emails are confusing to Grouped seemingly unrelated alert emails are confusing size:M
  • Description updated (diff)
  • Priority changed from High to Low
  • Target version changed from Ready to Tools - Next
Actions #7

Updated by okurz 8 months ago

  • Related to action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added
Actions #8

Updated by okurz 8 months ago

  • Status changed from New to Workable
Actions #9

Updated by okurz 6 months ago

  • Target version changed from Tools - Next to Ready
Actions #10

Updated by okurz 6 months ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz

Waiting for #133130 first

Actions #11

Updated by okurz 5 months ago

  • Status changed from Blocked to Rejected

@livdywan as we have closed #133130 actually without any production changes I opt to also reject this ticket here as I don't see something actionable for us right now. Also nicksinger mentioned that we should consider redoing how we manage alert templates including the grouping "folders" but I see that further down the roadmap so I am rejecting this ticket. If you feel strongly about this ticket feel welcome to reopen but please reconsider the description to make it more clear what's required and what we could do to improve.

Actions #12

Updated by okurz about 2 months ago

  • Related to action #159639: [alert] "web UI: Too many 5xx HTTP responses alert" size:S added
Actions #13

Updated by okurz about 2 months ago

  • Related to action #159657: [alert] about "web UI: Too many 5xx HTTP responses alert" hidden behind grafana alert grouping added
Actions

Also available in: Atom PDF