coordination #105699: [epic] 5 whys follow-up to s390x svirt jobs incomplete with unable to extract assets:.*/var/lib/libvirt/images/a.img" size:S - openQA Project (public) - openSUSE Project Management Tool

Blocks action #104520: Move svirt extract_asset code from os-autoinst-distri-opensuse to os-autoinst/backend/svirt.pm size:M auto_review:"unable to extract assets: Can't call method.+name.+on an undefined value":retry added

Actions

Copy link

Updated by mkittler almost 3 years ago

This is likely supposed to be discussed in a call. However, here are two questions I've been asking myself:

Why didn't I block any of the problematic PRs by requesting changes requiring a manual verification run?

It seemed a bit too much to ask for so I only suggested to have a look at https://github.com/Martchus/openQA-helper#testrun-svirt-backend-locally on the 2nd PR. Maybe we should be stricter when doing changes in important backends without automatic test coverage. I suppose there were even unit tests but they don't seem to be enough (that's why we have the fullstack test but it only covers the QEMU backend).

Why was svirt development not so problematic before (or was it)?

I suppose not many people dared to touch svrit code in the past. It was done more by test maintainers themselves which usually conduct manual verification runs. When I worked on the backend I tried to do the same, sometimes getting help from test maintainers. It was sometimes a lot of effort, though.

Actions

Copy link

Updated by mkittler almost 3 years ago

Assignee set to mkittler

One more question from my side to bring up in the retro:

Why haven't we been able to create a better automatic fullstack test for the svirt backend? (I'm not saying that the test must necessarily be executed in a CI context. Having an automated test which runs locally utilizing hardware accessible via the internal network would already be a huge improvement.)

Actions

Copy link

Updated by livdywan almost 3 years ago

Why did reaching out to experts not prevent this?

I made a point of reaching out to relevant maintainers before I even started looking into #104520, and keeping them in the loop. Arguably that gave me the confidence to even try. Except it doesn't look like it made much of a difference.

Actions

Copy link

Updated by okurz almost 3 years ago

One important factor for myself is that I keep forgetting or overlooking certain things in reviews because I review in multiple different projects and for each I need to look for different things, e.g. in openQA I can usually trust the automated test assuming that there are according tests added and code coverage is fine, for os-autoinst I would need to carefully try myself or ask for tests, for openQA test distributions I would need to look for according openQA verification jobs. I suggest we look for "review checklists". I am thinking of a bot like mergify that responds to pull requests with a review checklist that we can tweak per project and then reviewers can cross off the individual points of that checklist.

Actions

Copy link

Updated by livdywan almost 3 years ago

Subject changed from 5 whys follow-up to s390x svirt jobs incomplete with unable to extract assets:.*/var/lib/libvirt/images/a.img" to 5 whys follow-up to s390x svirt jobs incomplete with unable to extract assets:.*/var/lib/libvirt/images/a.img" size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz almost 3 years ago

Tracker changed from action to coordination
Subject changed from 5 whys follow-up to s390x svirt jobs incomplete with unable to extract assets:.*/var/lib/libvirt/images/a.img" size:S to [epic] 5 whys follow-up to s390x svirt jobs incomplete with unable to extract assets:.*/var/lib/libvirt/images/a.img" size:S
Status changed from Workable to In Progress

Five Whys¶

Why did we start the work on the svirt backend in the first place?
- The backend was never capable to fully execute tests without relying on some specific code in os-autoinst-distri-opensuse which was realized when working on another fix+feature from os-autoinst side. There was a contribution https://github.com/os-autoinst/os-autoinst/pull/1741 which had unexpected impact on the svirt backend causing tests to fail which we only realized after merging so the PR was reverted. And this is when we started to work on #104520 which is still open. So is https://github.com/os-autoinst/os-autoinst/pull/1898
Why didn't we block any of the problematic PRs by requesting changes requiring a manual verification run?
- It seems a bit too much to ask for so I only suggested to have a look at https://github.com/Martchus/openQA-helper#testrun-svirt-backend-locally on the 2nd PR. Maybe we should be stricter when doing changes in important backends without automatic test coverage. I suppose there were even unit tests but they don't seem to be enough (that's why we have the fullstack test but it only covers the QEMU backend). We were also not sure what actually could be tested here
-> 2-1: Ask for manual verification on review
=> okurz: Added a note to https://progress.opensuse.org/projects/qa/wiki/Tools#Definition-of-DONE
=> #106783

-> 2-2: Simplify the setup of "local svirt environment" and include it officially in os-autoinst (not just "Martchus" special)
=> #106867

-> 2-3: Also describe how to take out a production worker for testing
=> #106996

-> 2-4: Experiment again using custom openQA backend commands so that we can even use the os-autoinst version from a PR in a container
=> #106999
Why was svirt development not so problematic before (or was it)?
- Not many people dared to touch svirt code in the past. It was done more by test maintainers themselves which usually conduct manual verification runs. When mkittler worked on the backend he tried to do the same, sometimes getting help from test maintainers. It was sometimes a lot of effort, though. We should improve our workflow
-> 3-1: Potentially be more conservative regarding changing existing code (but rather introduce separate code)
Why haven't we been able to create a better automatic fullstack test for the svirt backend? (I'm not saying that the test must necessarily be executed in a CI context. Having an automated test which runs locally utilizing hardware accessible via the internal network would already be a huge improvement.)?

-> 4-1: It simply wasn't done which does not need to stop us :) okurz suggest to first have complete statement coverage even though it's limited, and then extend system level tests
Why did we not ask experts to test this before merge?
- Plans were reviewed, which is good. But this does not cover testing the implementation.
-> 5-1: We should explicitly ask domain experts and previous contributors to review and test

-> 5-2: One important factor for myself (okurz) is that I keep forgetting or overlooking certain things in reviews because I review in multiple different projects and for each I need to look for different things, e.g. in openQA I can usually trust the automated test assuming that there are according tests added and code coverage is fine, for os-autoinst I would need to carefully try myself or ask for tests, for openQA test distributions I would need to look for according openQA verification jobs. I suggest we look for "review checklists". I am thinking of a bot like mergify that responds to pull requests with a review checklist that we can tweak per project and then reviewers can cross off the individual points of that checklist.
=> #107005

Actions

Copy link

#10

Updated by mkittler almost 3 years ago

Status changed from In Progress to Feedback

@okurz Thanks for summarizing the results. Is there anything left to do as part of this ticket? There are many suggestions but I'm not sure which ones should go into a follow-up ticket.

Actions

Copy link

#11

Updated by okurz almost 3 years ago

mkittler wrote:

@okurz Thanks for summarizing the results. Is there anything left to do as part of this ticket? There are many suggestions but I'm not sure which ones should go into a follow-up ticket.

If in doubt all of the suggestions should go into follow-up tickets, at best simple suggestions only into #65271 . If you can and want to, implement the suggestions directly, e.g. automatic checklists for review or something.

Actions

Copy link

#12