action #54173
closed[virtualization][xen][sporadic] qemu-img Failed to get "write" lock on xen workers (test fails in bootloader_svirt - test fails in bootloader_start)
Added by riafarov over 5 years ago. Updated almost 4 years ago.
0%
Description
Observation¶
Formatting '/var/lib/libvirt/images/openQA-SUT-2b.img', fmt=qcow2 size=21474836480 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2019-07-12T10:10:45.345 CEST] [debug] Command's stderr:
qemu-img: /var/lib/libvirt/images/openQA-SUT-2b.img: Failed to get "write" lock
Is another process using the image?
Under certain conditions, we have lock on the image file remaining. There are multiple ways to address such issue, like using random postfix in the image name + cleanup. As seems that destroy command from previous run fails, so we need to check why.
This will solve root cause of the issue.
Alternative will be to investigate the problem and call lsof to see which process uses the image and put more things to investigate the issue in case it occurs.
We also could re-try in case command fails, or test if we can get write lock before with qemu-img info command.
openQA test in scenario sle-12-SP5-Server-DVD-x86_64-minimal+base@yast-xen-pv@svirt-xen-pv fails in
bootloader_start
Test suite description¶
Mantainer: jrivera Select a minimal textmode installation by starting with the default and unselecting all patterns except for "base" and "minimal". Not to be confused with the new system role "minimal" introduced with SLE15. Test modules 'grub_disable_timeout' and 'grub_test' in xen-pv are not scheduled due to grub2 doesn't support xfb console.
Reproducible¶
Fails since (at least) Build 0222 (current job)
Expected result¶
Last good: 0219 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by riafarov over 5 years ago
- Subject changed from [virtualization][xen][sporadic] qemu-img Failed to get "write" lockon xen workers to [virtualization][xen][sporadic] qemu-img Failed to get "write" lock on xen workers
Updated by riafarov over 5 years ago
After investigation with Matthias Griesmeier, we have identified the root cause.
Problem is that for xen, in case we call wait_boot
anywhere in the test suite (which is the case for any test suite where we need explicit reboot), we set SVIRT_KEEP_VM_RUNNING
in the method attach_to_running
.
When this variable is set, no destroy will happen even in case job fails. This will lead to the mentioned problem.
Seems that at least we need to unset this variable once machine was booted, but from the code we see, looks like attach_to_running
is executed when machine is already running.
Updated by SLindoMansilla over 5 years ago
This ticket is a duplicate of #53849
But, I will keep this one since it has more information about the problem.
Updated by SLindoMansilla over 5 years ago
- Has duplicate action #53849: [sle][functional][u] test fails in bootloader_zkvm - qemu-img create failed added
Updated by szarate over 5 years ago
- Has duplicate action #54758: [tools][u] test fails in bootloader_svirt - Failed to get "write" lock via virsh added
Updated by szarate over 5 years ago
- Status changed from New to In Progress
- Assignee set to szarate
Updated by szarate over 5 years ago
- Has duplicate action #54863: [functional][u] test fails in bootloader_svirt - Missing domains in libvirt but still runnning in XEN. added
Updated by szarate over 5 years ago
- Related to action #55058: [virtualization][xen][u] test fails in bootloader_svirt - domain xml file seems to be corrupt added
Updated by szarate over 5 years ago
After doing some cleanup yesterday, some svirt jobs are able to run, poo#55058 is still there though
Updated by riafarov over 5 years ago
Now I can see same issue on s390x with svirt too, whereas our initial analysis has detected issue only for xen:
Formatting '/var/lib/libvirt/images/openQA-SUT-3a.img', fmt=qcow2 size=32212254720 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2019-08-08T10:59:22.161 CEST] [debug] Command's stderr:
qemu-img: /var/lib/libvirt/images/openQA-SUT-3a.img: Failed to get "write" lock
https://openqa.suse.de/tests/3224320/file/autoinst-log.txt
In the logs we can see that domain is defined.
Updated by szarate over 5 years ago
- Has duplicate action #54866: [functional][u] test fails in bootloader_svirt added
Updated by szarate over 5 years ago
- Status changed from In Progress to Workable
Updated by riafarov over 5 years ago
- Has duplicate action #55961: [functional][u][sporadic] test fails in bootloader_svirt - qemu-img: Failed to get "write" lock Is another process using the image? added
Updated by SLindoMansilla over 5 years ago
- Subject changed from [virtualization][xen][sporadic] qemu-img Failed to get "write" lock on xen workers to [virtualization][xen][sporadic] qemu-img Failed to get "write" lock on xen workers (test fails in bootloader_svirt - test fails in bootloader_start)
Due to the several duplicates, let's make openQA results reviewers find this ticket easier.
Updated by szarate over 5 years ago
- Status changed from Workable to In Progress
Updated by szarate over 5 years ago
Updated by szarate over 5 years ago
- Status changed from In Progress to Feedback
Pr is open, waiting for reviews
Updated by szarate over 5 years ago
pr merged, waiting for deployment, however I guess that some more changes will be needed
Updated by szarate over 5 years ago
Next step: Reproduce the actual failure https://progress.opensuse.org/issues/54173#note-11, and survive it.
Updated by szarate over 5 years ago
Now the thing is simply "Handled" but may be virsh dominfo openQA-SUT-$instance --title
Will be needed, need to check other jobs in this run, to see if the actual patch did manage to solve anything
Updated by szarate about 5 years ago
Now I have a data point: This job failed with this problem... https://openqa.suse.de/tests/3591586
Will look a bit deeper later.
Updated by szarate about 5 years ago
- Blocks action #58787: [qe-core] test fails in bootloader_svirt - qemu-img create with backing file failed added
Updated by JRivrain about 5 years ago
I see that in this case https://openqa.suse.de/tests/3698948 where bootloader_svirt was replaced by bootloader_start, it is still happening.
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default@svirt-hyperv-uefi
https://openqa.suse.de/tests/3822960
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default@svirt-hyperv-uefi
https://openqa.suse.de/tests/3885550
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode@svirt-xen-hvm
https://openqa.suse.de/tests/3928603
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default@svirt-hyperv-uefi
https://openqa.suse.de/tests/3983226
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default@svirt-hyperv-uefi
https://openqa.suse.de/tests/3983226
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by openqa_review almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default@svirt-hyperv-uefi
https://openqa.suse.de/tests/4038458
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by szarate over 4 years ago
https://gitlab.suse.de/foursixnine/openqa-job-retrigger-tool/-/merge_requests/2 for now this is running on phobos, working on a container to have it on gitlab
Updated by okurz over 4 years ago
Interesting approach :) I suggest you also take a look at https://github.com/os-autoinst/scripts/blob/master/openqa-monitor-incompletes which I call combined with https://github.com/os-autoinst/scripts/blob/master/openqa-label-known-issues and this is automatically called in https://gitlab.suse.de/openqa/auto-review/pipelines . What is done there is that autoinst-log.txt is automatically parsed also against regex's defined in progress tickets' subject line based on special markers. This can easily by extended by replacing openqa-monitor-incompletes with a tool to query for the failed jobs you want to find labels for, if this is what you want to achieve? :) If you like we can have a brainstorm chat about that?
Updated by szarate almost 4 years ago
- Status changed from Feedback to Closed
This hasn't been seen for quite a long time, closing
Updated by szarate almost 4 years ago
- Related to action #43673: [functional][u] - test fails in bootloader_zkvm - worker couldn't get write lock for the disk image added
Updated by okurz almost 4 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: create_hdd_minimal_base+sdk_withhome@s390x-kvm-sle15
https://openqa.suse.de/tests/5421382
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed