action #132437
closedEnsure everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems size:M
0%
Description
Motivation¶
from retro 2023-07-07 we identified that we had "many alerts recently". One point we need to ensure is that everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems in particular when alerts are related to the current work we do so that others are not looking into alerts assuming they would be unhandled.
Acceptance criteria¶
- AC1: The majority of SUSE QE Tools knows to silence alerts in grafana, zabbix, gitlab CI, openqa-logwarn, openQA "unknown issues" messages, etc.
Suggestions¶
- In a common team meeting go with the team over all systems mentioned in AC1 and shows how it works and clarify questions
- As needed extend Alert handling as documented in the wiki or the salt states README
Updated by okurz over 1 year ago
- Subject changed from Ensure everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems to Ensure everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan over 1 year ago
- Description updated (diff)
- Status changed from Workable to In Progress
- Assignee set to livdywan
Despite the e.g. the wiki pretty much assumes Grafana is the only place where we see alerts so I'll start by adding other alerts here.
Updated by livdywan over 1 year ago
livdywan wrote in #note-2:
Despite the e.g. the wiki pretty much assumes Grafana is the only place where we see alerts so I'll start by adding other alerts here.
Not actually sure myself how to silence/pause Munin, Zabbix, Unknown issue or logwarn alerts. Let's see if others know
Updated by openqa_review over 1 year ago
- Due date set to 2023-10-25
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan over 1 year ago
I added a note on suppressing problems in Zabbix. Unfortunately I couldn't really test it yet.
Updated by livdywan over 1 year ago
Open questions:
- How should openqa-logwarn issues be silenced?
- Or maybe we actually document that we don't really do that? Since we might rather address these e.g. by proposing a change to logging in relevant components?
- How should "Unknown issue" emails be silenced?
- By filing and adding a ticket to the affected job?
Updated by okurz over 1 year ago
livdywan wrote in #note-7:
Open questions:
- How should openqa-logwarn issues be silenced?
- Or maybe we actually document that we don't really do that? Since we might rather address these e.g. by proposing a change to logging in relevant components?
By adding "known issues" to https://github.com/os-autoinst/openqa-logwarn/blob/master/logwarn_openqa
- How should "Unknown issue" emails be silenced?
- By filing and adding a ticket to the affected job?
yes
Updated by livdywan over 1 year ago
okurz wrote in #note-8:
By adding "known issues" to https://github.com/os-autoinst/openqa-logwarn/blob/master/logwarn_openqa
That's the part I couldn't remember. There isn't a config file :-D We should document this properly.
Updated by livdywan over 1 year ago
- Status changed from In Progress to Feedback
livdywan wrote in #note-9:
Merged. Now to confirm that the team understands what's documented in our wiki. Definitely had some insightful conversations this week about things that weren't clear before.
Updated by livdywan over 1 year ago
As we realized it's confusing to use unknown and unreviewed issues interchangeably I'm attempting to rectify that, either by always talking about unreviewed as in the email subject or unknown. Let's see if others have strong opinions on this one: https://github.com/os-autoinst/scripts/pull/266
Updated by livdywan over 1 year ago
- Status changed from Feedback to Resolved
Everyone is somewhat comfortable with the updates. And let's remember to update the steps in case we find any gaps later!