Project

General

Profile

Actions

action #161324

closed

Conduct "lessons learned" with Five Why analysis for "osd not accessible, 502 Bad Gateway"

Added by okurz about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Organisational
Target version:
Start date:
2024-05-31
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

In #161309 was down for multiple hours (ongoing at time of writing on 2024-05-31). We should learn what happened and find improvements for the future.

Acceptance criteria

  • AC1: A Five-Whys analysis has been conducted and results documented
  • AC2: Improvements are planned

Suggestions

  • Bring up in retro
  • Conduct "Five-Whys" analysis for the topic
  • Identify follow-up tasks in tickets
  • Organize a call to conduct the 5 whys (not as part of the retro)

Ideas


Related issues 5 (3 open2 closed)

Related to QA - action #132149: Coordinate with Eng-Infra to get simple management access to VMs (o3/osd/qa-jump.qe.nue2.suse.org) size:MBlockedokurz2023-06-29

Actions
Related to openQA Infrastructure - action #161429: incomplete config files on OSD due to salt - create annotations in grafana on the time of the osd deployment as well as salt-states-openqa deploymentsNew2024-06-03

Actions
Related to openQA Infrastructure - action #161426: incomplete config files on OSD due to salt - introduce post-deploy monitoring steps like in osd-deployment but in salt-states-openqaNew2024-06-03

Actions
Related to openQA Infrastructure - action #161423: [timeboxed:10h] Incomplete config files on OSD due to salt - Improve salt state application from remotely accessible salt master size:SResolvedokurz2024-06-03

Actions
Copied from openQA Infrastructure - action #161309: osd not accessible, 502 Bad GatewayResolvedjbaier_cz2024-05-31

Actions
Actions

Also available in: Atom PDF