action #138044
closed
Grouped seemingly unrelated alert emails are confusing size:M
Added by livdywan about 1 year ago.
Updated 11 months ago.
Description
Observation¶
The email subject says FIRING:2, and it points to multiple seemingly unrelated alerts including
- Packet loss between worker hosts and other hosts alert
- Failed systemd services alert (except openqa.suse.de)
which leads to confusion as to what's being alerted about.
Acceptance criteria¶
- AC1: alert email subject lines allow to find out related alerts
- AC2: We are still not receiving "too many" (subjective) emails for related alerts
Suggestions¶
- Clarify why these alerts are being grouped
- Find a way to give the email a useful subject
- Group/ungroup alerts based on clear patterns
- If there is no other way maybe ungroup again completely in coordination with the team
- Related to action #137600: [alert] Packet loss between worker hosts and other hosts size:S added
- Related to action #122848: Configure grouped alerts in Grafana correctly size:M added
- Related to action #137270: [FIRING:1] host_up (malbec: host up alert openQA malbec host_up_alert_malbec worker) and similar about malbec added
- Related to deleted (action #137270: [FIRING:1] host_up (malbec: host up alert openQA malbec host_up_alert_malbec worker) and similar about malbec)
- Subject changed from Grouped seemingly unrelated alert emails are confusing to Grouped seemingly unrelated alert emails are confusing size:M
- Description updated (diff)
- Priority changed from High to Low
- Target version changed from Ready to Tools - Next
- Related to action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added
- Status changed from New to Workable
- Target version changed from Tools - Next to Ready
- Status changed from Workable to Blocked
- Assignee set to okurz
- Status changed from Blocked to Rejected
@livdywan as we have closed #133130 actually without any production changes I opt to also reject this ticket here as I don't see something actionable for us right now. Also nicksinger mentioned that we should consider redoing how we manage alert templates including the grouping "folders" but I see that further down the roadmap so I am rejecting this ticket. If you feel strongly about this ticket feel welcome to reopen but please reconsider the description to make it more clear what's required and what we could do to improve.
- Related to action #159639: [alert] "web UI: Too many 5xx HTTP responses alert" size:S added
- Related to action #159657: [alert] about "web UI: Too many 5xx HTTP responses alert" hidden behind grafana alert grouping added
Also available in: Atom
PDF