Project

General

Profile

Actions

action #180863

closed

coordination #154777: [saga][epic] Shareable os-autoinst and test distribution plugins

coordination #162131: [epic] future version control related features in openQA

Conduct lessons learned "Five Why" analysis for "Gracious handling of longer remote git clones outages" size:S

Added by livdywan about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Let's discuss how #179038 took that long

Background

So what happened?

Questions

  1. Why did the ticket take that long
    • A1-1: Mostly because the "wrong approach" involving retries was taken in the beginning
    • => I1-1-1: See I2-1-1
    • => I1-1-2: Do more TDD, for real, don't cheat, real TDD, we mean it!
    • A1-2: Also because implementing the code review suggestions took long
    • => I1-2-1: We should try more focused team-work also supported by a Scrum Master in each (small) team
    • A1-3: Because of misunderstanding test code mocking something …
    • => I1-3-1: Do TDD, see I1-1-2, seriously
  2. Why was the "wrong approach" using a for-loop taken?
    • A2-1: Maybe the ticket description was not read or followed properly
    • => I2-1-1: Remind ourselves to read ticket descriptions carefully and crosscheck that in dailies
    • A2-2: Being unfamiliar with the code and in particular the "minion" framework
    • => I2-2-1: Use the collab sessions or do pair-programming or ask for code walkthroughs more often
  3. Why was the "whole thing" implemented even though during refinement we clearly split out #179185 "Detection of long-time remote git clone outages"?
    • A3-1: Because the collab sessions did not help|effective
  4. Why did the collab sessions not help?
    • A4-1: Because there were often new code review comments coming up
    • => I4-1-1: See I5-1-1
    • A4-2: We were too confident with the approach
    • => I4-2-1: Cross-check more often if the taken approach is still correct, e.g. in dailies be more specific about steps planned
  5. Why did new code review comments come up repeatedly?
    • A5-1: Because only during code review it became apparent that the "single file" approach will not work with multiple git providers
    • => I5-1-1: Rather than 100+ github PR comments do more pair-programming sessions

Acceptance criteria

  • AC1: A Five-Whys analysis has been conducted and results documented
  • AC2: Improvements are planned

Suggestions

  • Bring up in retro
  • Conduct "Five-Whys" analysis for the topic
  • Identify follow-up tasks in tickets
  • Organize a call to conduct the 5 whys

Related issues 1 (0 open1 closed)

Copied from openQA Project (public) - action #179038: Gracious handling of longer remote git clones outages size:SResolvedmkittler2025-03-17

Actions
Actions #1

Updated by livdywan about 2 months ago

  • Copied from action #179038: Gracious handling of longer remote git clones outages size:S added
Actions #2

Updated by livdywan about 2 months ago

Note that I'm using the same format and hence consider this already estimated. Please let me know if anyone would still prefer to discuss this beforehand.

Actions #3

Updated by okurz about 2 months ago

But most of the ticket description does not make sense

Actions #4

Updated by livdywan about 2 months ago

  • Description updated (diff)

okurz wrote in #note-3:

But most of the ticket description does not make sense

Sorry. Apparently the previous ticket was saved rather than the empty template. Fixed now 🙃

Actions #5

Updated by livdywan about 2 months ago

Let's postpone to next week when Rob is able to join. I didn't realize he would not be available this afternoon 🤦🏼

Actions #6

Updated by okurz about 2 months ago

  • Copied to action #181184: Conduct lessons learned "Five Why" analysis for "Lessons learned for "OSD is down since 2025-04-19 due to accidental user actions removing parts of the root filesystem" size:S added
Actions #7

Updated by okurz about 2 months ago

  • Copied to deleted (action #181184: Conduct lessons learned "Five Why" analysis for "Lessons learned for "OSD is down since 2025-04-19 due to accidental user actions removing parts of the root filesystem" size:S)
Actions #8

Updated by livdywan about 1 month ago

  • Tags set to collaborative-session
Actions #9

Updated by okurz about 1 month ago

  • Tags changed from collaborative-session to collaborative-session, five why
Actions #10

Updated by livdywan about 1 month ago

  • Tags changed from collaborative-session, five why to collaborative-session
  • Description updated (diff)
  • Status changed from Workable to Resolved
Actions #11

Updated by livdywan about 1 month ago

  • Tags changed from collaborative-session to collaborative-session, five why
Actions

Also available in: Atom PDF