Actions
action #138044
closedGrouped seemingly unrelated alert emails are confusing size:M
Start date:
2023-10-09
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
The email subject says FIRING:2, and it points to multiple seemingly unrelated alerts including
- Packet loss between worker hosts and other hosts alert
- Failed systemd services alert (except openqa.suse.de)
which leads to confusion as to what's being alerted about.
Acceptance criteria¶
- AC1: alert email subject lines allow to find out related alerts
- AC2: We are still not receiving "too many" (subjective) emails for related alerts
Suggestions¶
- Clarify why these alerts are being grouped
- Find a way to give the email a useful subject
- Group/ungroup alerts based on clear patterns
- If there is no other way maybe ungroup again completely in coordination with the team
Updated by livdywan about 1 year ago
- Related to action #137600: [alert] Packet loss between worker hosts and other hosts size:S added
Updated by tinita about 1 year ago
- Related to action #122848: Configure grouped alerts in Grafana correctly size:M added
Updated by okurz about 1 year ago
- Related to action #137270: [FIRING:1] host_up (malbec: host up alert openQA malbec host_up_alert_malbec worker) and similar about malbec added
Updated by okurz about 1 year ago
- Related to deleted (action #137270: [FIRING:1] host_up (malbec: host up alert openQA malbec host_up_alert_malbec worker) and similar about malbec)
Updated by okurz about 1 year ago
- Subject changed from Grouped seemingly unrelated alert emails are confusing to Grouped seemingly unrelated alert emails are confusing size:M
- Description updated (diff)
- Priority changed from High to Low
- Target version changed from Ready to Tools - Next
Updated by okurz about 1 year ago
- Related to action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added
Updated by okurz almost 1 year ago
- Target version changed from Tools - Next to Ready
Updated by okurz 11 months ago
- Status changed from Blocked to Rejected
@livdywan as we have closed #133130 actually without any production changes I opt to also reject this ticket here as I don't see something actionable for us right now. Also nicksinger mentioned that we should consider redoing how we manage alert templates including the grouping "folders" but I see that further down the roadmap so I am rejecting this ticket. If you feel strongly about this ticket feel welcome to reopen but please reconsider the description to make it more clear what's required and what we could do to improve.
Updated by okurz 7 months ago
- Related to action #159639: [alert] "web UI: Too many 5xx HTTP responses alert" size:S added
Updated by okurz 7 months ago
- Related to action #159657: [alert] about "web UI: Too many 5xx HTTP responses alert" hidden behind grafana alert grouping added
Actions