action #132437
closedEnsure everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems size:M
0%
Description
Motivation¶
from retro 2023-07-07 we identified that we had "many alerts recently". One point we need to ensure is that everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems in particular when alerts are related to the current work we do so that others are not looking into alerts assuming they would be unhandled.
Acceptance criteria¶
- AC1: The majority of SUSE QE Tools knows to silence alerts in grafana, zabbix, gitlab CI, openqa-logwarn, openQA "unknown issues" messages, etc.
Suggestions¶
- In a common team meeting go with the team over all systems mentioned in AC1 and shows how it works and clarify questions
- As needed extend Alert handling as documented in the wiki or the salt states README
Updated by okurz about 1 year ago
- Subject changed from Ensure everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems to Ensure everybody in SUSE QE Tools knows how to silence alerts in various monitoring systems size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan 11 months ago
livdywan wrote in #note-2:
Despite the e.g. the wiki pretty much assumes Grafana is the only place where we see alerts so I'll start by adding other alerts here.
Not actually sure myself how to silence/pause Munin, Zabbix, Unknown issue or logwarn alerts. Let's see if others know
Updated by openqa_review 11 months ago
- Due date set to 2023-10-25
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan 11 months ago
I added a note on suppressing problems in Zabbix. Unfortunately I couldn't really test it yet.
Updated by livdywan 11 months ago
Open questions:
- How should openqa-logwarn issues be silenced?
- Or maybe we actually document that we don't really do that? Since we might rather address these e.g. by proposing a change to logging in relevant components?
- How should "Unknown issue" emails be silenced?
- By filing and adding a ticket to the affected job?
Updated by okurz 11 months ago
livdywan wrote in #note-7:
Open questions:
- How should openqa-logwarn issues be silenced?
- Or maybe we actually document that we don't really do that? Since we might rather address these e.g. by proposing a change to logging in relevant components?
By adding "known issues" to https://github.com/os-autoinst/openqa-logwarn/blob/master/logwarn_openqa
- How should "Unknown issue" emails be silenced?
- By filing and adding a ticket to the affected job?
yes
Updated by livdywan 11 months ago
okurz wrote in #note-8:
By adding "known issues" to https://github.com/os-autoinst/openqa-logwarn/blob/master/logwarn_openqa
That's the part I couldn't remember. There isn't a config file :-D We should document this properly.
Updated by livdywan 11 months ago
- Status changed from In Progress to Feedback
livdywan wrote in #note-9:
Merged. Now to confirm that the team understands what's documented in our wiki. Definitely had some insightful conversations this week about things that weren't clear before.
Updated by livdywan 11 months ago
As we realized it's confusing to use unknown and unreviewed issues interchangeably I'm attempting to rectify that, either by always talking about unreviewed as in the email subject or unknown. Let's see if others have strong opinions on this one: https://github.com/os-autoinst/scripts/pull/266