Project

General

Profile

Actions

action #163775

closed

Conduct "lessons learned" with Five Why analysis about many alerts, e.g. alerts not silenced for known issues size:S

Added by okurz 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Organisational
Start date:
2024-07-10
Due date:
% Done:

0%

Estimated time:

Description

Motivation

The past days if not weeks we have seen many alerts. Many of them not being handled according to our process and also many of them not silenced when we already know about an issue causing alert fatigue and relying only on the person on alert duty to handle them.

Acceptance criteria

  • AC1: A Five-Whys analysis has been conducted and results documented
  • AC2: Improvements are planned

Suggestions

  • Bring up in retro
  • Conduct "Five-Whys" analysis for the topic
  • Identify follow-up tasks in tickets
  • Organize a call to conduct the 5 whys (not as part of the retro)

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #163928: [alert] Openqa HTTP Response lost on 15-07-24 size:SResolvedokurz2024-07-15

Actions
Copied from openQA Infrastructure (public) - action #163610: Conduct "lessons learned" with Five Why analysis for "[alert] (HTTP Response alert Salt tm0h5mf4k)"Resolvedokurz2024-07-10

Actions
Actions #1

Updated by okurz 5 months ago

  • Copied from action #163610: Conduct "lessons learned" with Five Why analysis for "[alert] (HTTP Response alert Salt tm0h5mf4k)" added
Actions #2

Updated by okurz 5 months ago

  • Subject changed from Conduct "lessons learned" with Five Why analysis about many alerts, e.g. alerts not silenced for known issues to Conduct "lessons learned" with Five Why analysis about many alerts, e.g. alerts not silenced for known issues size:S
  • Status changed from New to Workable
Actions #3

Updated by livdywan 5 months ago ยท Edited

  • Status changed from Workable to Feedback
  • Assignee set to livdywan

Let's discuss this on Wednesday 24 July afternoon. I feel like it would be good to consider #163610 in this context but I guess we can also do this one first.

Actions #4

Updated by livdywan 5 months ago

livdywan wrote in #note-3:

Let's discuss this on Wednesday 24 July afternoon. I feel like it would be good to consider #163610 in this context but I guess we can also do this one first.

Happening today at 14.00 Berlin time

Actions #5

Updated by livdywan 5 months ago

  • Related to action #163928: [alert] Openqa HTTP Response lost on 15-07-24 size:S added
Actions #6

Updated by livdywan 5 months ago

  • Status changed from Feedback to Resolved

Five whys

  1. Why don't we silence HTTP response alerts?
    • The HTTP response alert is very broad
    • Everyone knew about this anyway
    • The alert would tell us if issues have been addressed
    • A notification policy override can restrict the recipient of an alert
    • Suggestion: A notification policy is derrived from the alert silence
  2. Why don't we silence pipelines?
    • People are not aware of the script
    • Conveniently VPN and GitLab are not usable
    • It is rather difficult to do

We only answered 2 questions. Nevertheless improvements to our documented process were made: https://progress.opensuse.org/projects/qa/wiki/Tools#Alert-handling

Actions

Also available in: Atom PDF