action #105690
closeds390x svirt jobs incomplete with auto_review:"unable to extract assets:.*/var/lib/libvirt/images/a.img":retry
Description
Observation¶
jobs incomplete with qemu-img: Could not open '/var/lib/libvirt/images/a.img': Could not open '/var/lib/libvirt/images/a.img': No such file or directory
, for instance https://openqa.suse.de/tests/8047803 :
[2022-01-28T09:34:25.924322+01:00] [debug] running `nice ionice qemu-img convert -p -O qcow2 /var/lib/libvirt/images/a.img assets_public/sle-12-SP4-s390x-4.12.14-41.1.g82b276a-Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-sle12-with-ltp.qcow2 -c`
[2022-01-28T09:34:25.930323+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 398978 and exit status: 0
[2022-01-28T09:34:25.930519+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 399590 and exit status: 0
[2022-01-28T09:34:25.934036+01:00] [debug] (0.00/100%)
qemu-img: Could not open '/var/lib/libvirt/images/a.img': Could not open '/var/lib/libvirt/images/a.img': No such file or directory
[2022-01-28T09:34:25.934973+01:00] [warn] !!! bmwqemu::serialize_state: unable to extract assets: runcmd 'nice ionice qemu-img convert -p -O qcow2 /var/lib/libvirt/images/a.img assets_public/sle-12-SP4-s390x-4.12.14-41.1.g82b276a-Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-sle12-with-ltp.qcow2 -c' failed with exit code 1: ' (0.00/100%)
qemu-img: Could not open '/var/lib/libvirt/images/a.img': Could not open '/var/lib/libvirt/images/a.img': No such file or directory
' at /usr/lib/os-autoinst/osutils.pm line 109.
osutils::runcmd("nice", "ionice", "qemu-img", "convert", "-p", "-O", "qcow2", "/var/lib/libvirt/images/a.img", ...) called at /usr/lib/os-autoinst/backend/svirt.pm line 189
backend::svirt::do_extract_assets(backend::svirt=HASH(0x100359af2d8), HASH(0x100349611d8)) called at /usr/lib/os-autoinst/backend/driver.pm line 80
backend::driver::extract_assets(backend::driver=HASH(0x1003621a9f8), HASH(0x100349611d8)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Utils.pm line 163
eval {...} called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Utils.pm line 163
OpenQA::Isotovideo::Utils::handle_generated_assets(OpenQA::Isotovideo::CommandHandler=HASH(0x100359c7600), 1) called at /usr/bin/isotovideo line 409
seems like https://github.com/os-autoinst/os-autoinst/pull/1936 is the culprit. Complete changelog since last good:
- Thu Jan 27 2022 okurz@suse.com
- Update to version 4.6.1643299616.01abba344:
- Add more perl signatures
- Add more perl signatures
- Simplify code for assigning job settings in
create_from_settings
- Use constant for referring to a job's main settings
- Allow changing job settings via restart API
- Check whether clones created by the restart API take over the group
- Prevent error when restarting jobs with
skip_parents=1
Use tidyall instead of custom implementation
Tue Jan 25 2022 okurz@suse.com
Update to version 4.6.1643089984.09669586b:
Dependency cron 2022-01-22
Add signatures to OpenQA::App and OpenQA::BuildResults
templates: Fix wording for the "VNC display number"
Show number of restarts of a job within info box on details page
os-autoinst changes:
- Thu Jan 27 2022 okurz@suse.com
- Update to version 4.6.1643273407.65ca16b7:
svirt: Store vmname early for use after test run
Tue Jan 25 2022 okurz@suse.com
Update to version 4.6.1643061641.d319802b:
Continue further checks in fullstack test after one fails
Add OBS workflow
Simplify base inheritance statement with Mojo::Base everywhere
Exclude 29-backend-driver.t from OBS checks
Simplify string concatenation in log.pm
Extract all log functions into new module "log"
Add test for defining/starting VM via VMware in svirt backend
Add test for generating XML file with UEFI loader in svirt backend
Use tidyall for faster tidying
Fix single, unnecessary UTF8 character in consoles::VNC
git subrepo pull (merge) external/os-autoinst-common
Reproducible¶
Always reproducible on s390x svirt jobs since we have that change in.
Expected result¶
Last good https://openqa.suse.de/tests/8019621/logfile?filename=autoinst-log.txt
showing:
[2022-01-23T01:41:54.907320+01:00] [debug] <<< testapi::type_string(string="nice ionice qemu-img convert -p -O qcow2 /var/lib/libvirt/images/openQA-SUT-3a.img /var/lib/libvirt/images/sle-12-SP4-s390x-4.12.14-40.1.g1475601-Server-DVD-Incidents-Kernel-KOTD\@s390x-kvm-sle12-with-ltp.qcow2 && echo OK", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-23T01:42:02.749003+01:00] [debug] tests/kernel/../shutdown/svirt_upload_assets.pm:60 called svirt_upload_assets::extract_assets -> tests/kernel/../shutdown/svirt_upload_assets.pm:32 called testapi::assert_screen
Workaround¶
Revert https://github.com/os-autoinst/os-autoinst/pull/1936 and retrigger tests. OSD deployment was rolled back with https://gitlab.suse.de/openqa/osd-deployment/-/jobs/813165
Updated by okurz over 2 years ago
- Description updated (diff)
Was brought up by mdoucha in https://suse.slack.com/archives/C02CANHLANP/p1643362926351179 . I have rolled back the deployment on OSD and proposed a revert https://github.com/os-autoinst/os-autoinst/pull/1941 and will inform mdoucha.
Updated by okurz over 2 years ago
- Related to action #104520: Move svirt extract_asset code from os-autoinst-distri-opensuse to os-autoinst/backend/svirt.pm size:M auto_review:"unable to extract assets: Can't call method.+name.+on an undefined value":retry added
Updated by okurz over 2 years ago
- Status changed from New to In Progress
- Assignee set to okurz
Cris Dywan: Interesting. Those didn't show up on host=openqa.suse.de ./openqa-monitor-incompletes which I was checking just a few minutes ago
Oliver Kurz: Yes, because they received a label "missing_asset" already. Something we might want to make less greedy. I am currently running a locally patched "openqa-monitor-incompletes" which is also looking at already commented jobs
Updated by livdywan over 2 years ago
okurz wrote:
Was brought up by mdoucha in https://suse.slack.com/archives/C02CANHLANP/p1643362926351179 . I have rolled back the deployment on OSD and proposed a revert https://github.com/os-autoinst/os-autoinst/pull/1941 and will inform mdoucha.
The rollback went through, but it brought back https://github.com/os-autoinst/os-autoinst/pull/1895 rather than the previous state.
@okurz now proposed a revert of both https://github.com/os-autoinst/os-autoinst/pull/1942
Updated by livdywan over 2 years ago
- Copied to coordination #105699: [epic] 5 whys follow-up to s390x svirt jobs incomplete with unable to extract assets:.*/var/lib/libvirt/images/a.img" size:S added
Updated by livdywan over 2 years ago
- Description updated (diff)
- Status changed from In Progress to Feedback
cdywan wrote:
okurz wrote:
Was brought up by mdoucha in https://suse.slack.com/archives/C02CANHLANP/p1643362926351179 . I have rolled back the deployment on OSD and proposed a revert https://github.com/os-autoinst/os-autoinst/pull/1941 and will inform mdoucha.
The rollback went through, but it brought back https://github.com/os-autoinst/os-autoinst/pull/1895 rather than the previous state.
@okurz now proposed a revert of both https://github.com/os-autoinst/os-autoinst/pull/1942
Packages are built, deployment triggered.
See #105699 for a proposal to conduct 5 WHYs.
Updated by okurz over 2 years ago
cdywan wrote:
okurz wrote:
Was brought up by mdoucha in https://suse.slack.com/archives/C02CANHLANP/p1643362926351179 . I have rolled back the deployment on OSD and proposed a revert https://github.com/os-autoinst/os-autoinst/pull/1941 and will inform mdoucha.
The rollback went through, but it brought back https://github.com/os-autoinst/os-autoinst/pull/1895 rather than the previous state.
I assumed that the issue which was only observed in jobs as of today was also introduced by today's deployment but I was wrong. https://openqa.suse.de/tests/8047803#next_previous shows the last good 6 days ago and the first bad in the scenario https://openqa.suse.de/tests/8045433 already 2022-01-28 00:53 so before today's morning deployment. I have assumed that after the last rollback https://gitlab.suse.de/openqa/osd-deployment/-/jobs/809450 two days ago you would have ensured to have an actually fixed state prepared and merged but it was never ensured that after https://github.com/os-autoinst/os-autoinst/pull/1936 all issues would have been fixed.
@okurz now proposed a revert of both https://github.com/os-autoinst/os-autoinst/pull/1942
That was merged and is currently planned to be deployed by an extraordinary deployment https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/308409
As discussed in chat cdywan and me agreed to have a lessons learned session with five why analysis on the incident. See #105699
Updated by okurz over 2 years ago
deployment succeeded in https://gitlab.suse.de/openqa/osd-deployment/-/jobs/813761 . The changelog includes the double-revert commits. I triggered another run of openqa-label-known-issues and will monitor https://openqa.suse.de/tests/8053716 as an example run.
Updated by livdywan over 2 years ago
- Status changed from Feedback to Resolved
okurz wrote:
deployment succeeded in https://gitlab.suse.de/openqa/osd-deployment/-/jobs/813761 . The changelog includes the double-revert commits. I triggered another run of openqa-label-known-issues and will monitor https://openqa.suse.de/tests/8053716 as an example run.
Job passed and uploaded all assets successfully.