action #137558
closed[tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems
0%
Description
Observation¶
Many openQA jobs are failed at "rsync" disk images to s390x SUTs with error messages like:
"RSYNC_PASSWORD='xxxxxxx' rsync -av 'xxxxx.qcow2' '@s390_suts_xxxx.suse.de:/var/lib/libvirt/images/xxxxx.qcow2'" unexpectedly returned exit value 255 at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 566.
So can tools team take a look at this issue?
openQA test in scenario sle-15-SP6-Migration-from-SLE12-SPx-s390x-autoupgrade_sles12sp5_pscc_base_all_full_auto@s390x-kvm-sle12 fails in
bootloader_zkvm
Test suite description¶
The base test suite is used for job templates defined in YAML documents. It has no settings of its own.
Reproducible¶
Fails since (at least) Build 26.1
Expected result¶
Last good: 20.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by tinawang123 about 1 year ago
Also failed at x86 jobs:
https://openqa.suse.de/tests/12410293#step/bootloader_start/16
Updated by okurz about 1 year ago
- Priority changed from Normal to Urgent
- Target version set to Ready
Updated by jbaier_cz about 1 year ago
Maybe related to https://github.com/os-autoinst/os-autoinst/pull/2357
Updated by okurz about 1 year ago
Created revert PR https://github.com/os-autoinst/os-autoinst/pull/2380 due to significant regression, see
https://openqa.suse.de/tests/overview?modules=bootloader_zkvm&modules_result=failed&build=26.1&distri=sle&version=15-SP6&groupid=265#
Whoever is the next person after this revert is deployed please retrigger all according test failures.
Updated by tinita about 1 year ago
- Status changed from New to In Progress
- Assignee set to tinita
Updated by tinita about 1 year ago
- Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to "rsync" image problems to [tools][s390x][zkvm]test fails in bootloader_zkvm due to "rsync" image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255"
Updated by tinita about 1 year ago
So apparently the username is empty, which is also confirmed by the warning in the log:
Use of uninitialized value $c{"username"} in concatenation (.) or string at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 361.
Updated by okurz about 1 year ago
- Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to "rsync" image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255" to [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255"
Updated by tinita about 1 year ago
- Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255" to [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255":retry
Updated by tinita about 1 year ago
- Priority changed from Urgent to High
I restarted all failed with this command:
host=openqa.suse.de comment_query=' ' failed_since="'2023-10-06'" ./openqa-monitor-investigation-candidates >candidates.137558
cat candidates.137558 | host=openqa.suse.de ./openqa-label-known-issues-multi
Updated by tinita about 1 year ago
- Status changed from In Progress to Feedback
https://openqa.suse.de/tests/overview?modules=bootloader_zkvm&build=26.1&distri=sle&version=15-SP6&groupid=265# looks better now.
The build https://openqa.suse.de/tests/overview?state=assigned&state=setup&state=running&state=uploading&state=scheduled&distri=sle&version=15-SP6&build=26.1&groupid=265 mostly only has ppc64le jobs waiting to run.
Updated by livdywan about 1 year ago
Apparently @mkittler came up with a better version of the original svirt backend change: https://github.com/os-autoinst/os-autoinst/pull/2381
Updated by okurz about 1 year ago
Merged and deployed meanwhile https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1891096#L1243
Updated by tinita about 1 year ago
- Status changed from Feedback to Resolved
I found two tests from yesterday with this ticket referenced in the comment:
host=openqa.suse.de ./openqa-query-for-job-label 137558
12443491|2023-10-10 14:57:35|done|failed|slem_virtualization||worker31
12443486|2023-10-10 14:55:41|done|failed|slem_virtualization||worker31
...
Both were takeover comments. However, they failed both in a later step with a different error, so I deleted the comments.
https://openqa.suse.de/tests/12443486#comments
https://openqa.suse.de/tests/12443491#comments
So it seems to be fine now.
Updated by mdati about 1 year ago
- Status changed from Resolved to Feedback
Please, during last checks,after tests rerun retry attempt same issue re-occurring today, in tests:
sle-15-SP4-JeOS-for-kvm-and-xen-Updates-x86_64-Build20231015-1-jeos-kdump@svirt-xen-hvm
jeos-containers-podman@svirt-xen-pv
jeos-filesystem@svirt-xen-pv
jeos-fips@svirt-xen-pv
jeos-kdump@svirt-xen-hvm
jeos-kdump@svirt-xen-pv
https://openqa.suse.de/tests/12540540#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540538#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540539#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540553#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540538#step/bootloader_svirt/15
Desc.:
# Test died: {
"wantarray" => undef,
"cmd" => "backend_proxy_console_call",
"function" => "add_disk",
"args" => [
{
"backingfile" => 1,
"dev_id" => "a",
"file" => "/var/lib/openqa/share/factory/hdd/fixed/SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2"
}
],
"json_cmd_token" => "lbavhMTS",
"console" => "svirt"
}
"sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking=no' -av 'SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2' 'root@unreal6.qe.nue2.suse.org:/var/lib/libvirt/images//SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2'" unexpectedly returned exit value 11 at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 566.
IPC::System::Simple::_check_exit("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"..., 11, ARRAY(0x56006f8f6ee0)) called at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 538
IPC::System::Simple::_process_child_error(2816, "sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"..., ARRAY(0x56006f8f6ee0)) called at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 178
IPC::System::Simple::run("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"...) called at (eval 388) line 12
eval {...} called at (eval 388) line 11
Fatal::__ANON__("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"...) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 355
consoles::sshVirtsh::_system("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"...) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 361
consoles::sshVirtsh::_copy_image_else(consoles::sshVirtsh=HASH(0x5600749d02c8), "/var/lib/openqa/share/factory/hdd/fixed/SLES15-SP4-Minimal-VM"..., "SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2", "/var/lib/libvirt/images/") called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 384
consoles::sshVirtsh::_copy_image_to_vm_host(consoles::sshVirtsh=HASH(0x5600749d02c8), HASH(0x560073a8c2e0), "/vmfs/volumes//openQA/", "openQA-SUT-2a.img", "openQA-SUT-2", "/var/lib/libvirt/images/", undef) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 447
consoles::sshVirtsh::add_disk(consoles::sshVirtsh=HASH(0x5600749d02c8), HASH(0x560073a8c2e0)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 820
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 815
backend::baseclass::proxy_console_call(backend::svirt=HASH(0x5600733d7348), HASH(0x560073d35e48)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 79
backend::baseclass::handle_command(backend::svirt=HASH(0x5600733d7348), HASH(0x56007299a028)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 616
backend::baseclass::check_socket(backend::svirt=HASH(0x5600733d7348), IO::Handle=GLOB(0x5600732b7bc8), 0) called at /usr/lib/os-autoinst/backend/svirt.pm line 298
backend::svirt::check_socket(backend::svirt=HASH(0x5600733d7348), IO::Handle=GLOB(0x5600732b7bc8), 0) called at /usr/lib/os-autoinst/backend/baseclass.pm line 284
backend::baseclass::do_capture(backend::svirt=HASH(0x5600733d7348), undef, 1697456074.60292) called at /usr/lib/os-autoinst/backend/baseclass.pm line 311
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 311
backend::baseclass::run_capture_loop(backend::svirt=HASH(0x5600733d7348)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 133
backend::baseclass::run(backend::svirt=HASH(0x5600733d7348), 14, 17) called at /usr/lib/os-autoinst/backend/driver.pm line 68
backend::driver::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x560074060ee0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x560074060ee0), CODE(0x560073e79a88)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x560074060ee0)) called at /usr/lib/os-autoinst/backend/driver.pm line 72
backend::driver::start(backend::driver=HASH(0x560074c2b9a0)) called at /usr/lib/os-autoinst/backend/driver.pm line 37
backend::driver::new("backend::driver", "svirt") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 100
OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x56006c9321d8)) called at /usr/bin/isotovideo line 134
eval {...} called at /usr/bin/isotovideo line 122
at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 355
at /usr/lib/os-autoinst/backend/console_proxy.pm line 46.
backend::console_proxy::__ANON__(undef, HASH(0x56007407a6d0)) called at sle/tests/installation/bootloader_svirt.pm line 157
bootloader_svirt::run(bootloader_svirt=HASH(0x560072c0d430)) called at /usr/lib/os-autoinst/basetest.pm line 352
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 346
basetest::runtest(bootloader_svirt=HASH(0x560072c0d430)) called at /usr/lib/os-autoinst/autotest.pm line 400
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 400
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 257
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 257
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 308
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x5600740a1f50)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x5600740a1f50), CODE(0x5600743de2f0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x5600740a1f50)) called at /usr/lib/os-autoinst/autotest.pm line 310
autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 94
OpenQA::Isotovideo::Runner::start_autotest(OpenQA::Isotovideo::Runner=HASH(0x56006c9321d8)) called at /usr/bin/isotovideo line 133
eval {...} called at /usr/bin/isotovideo line 122
Updated by tinita about 1 year ago
- Assignee changed from tinita to mkittler
@mkittler is preparing a partial revert
Updated by tinita about 1 year ago
Note that the 5 linked jobs all have the following rsync output:
Could not create directory '/var/lib/empty/.ssh' (Permission denied).
Failed to add the host to the list of known hosts (/var/lib/empty/.ssh/known_hosts).
sending incremental file list
SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2
rsync: [receiver] write failed on "/var/lib/libvirt/images/SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(378) [receiver=3.2.3]
Updated by okurz about 1 year ago
- Target version changed from Ready to 640
As mkittler is working on that and I trust him to know what to do in the next step I am selecting "QE Kernel Current" which I assume is the correct way for kernel squad.
Updated by mkittler about 1 year ago
I haven't been working on it except trying to help people in chat and fixing the initial problems mentioned in the ticket description. The issues in further comments are a different problem.
You can create a revert of my changes via https://github.com/Martchus/os-autoinst/pull/new/revert-svirt-asset-handling in case you want that. I don't think it'll help with the problem No space left on device (28)
, though. Or is there enough space and this is a false error?
Updated by mkittler about 1 year ago
- Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255":retry to [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems
- Status changed from Feedback to Resolved
Looks like there's enough space on that host:
unreal6:~ # df -h /var/lib/libvirt/images/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/openqa_vg-openqa_lv 110G 50G 55G 48% /var/lib/libvirt/images
And copying also generally works:
martchus@openqa:/var/lib/openqa/share/factory/hdd/fixed> sshpass -p '…' rsync -e 'ssh -o StrictHostKeyChecking=no' -av 'SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2' 'root@unreal6.qe.nue2.suse.org:/var/lib/libvirt/images//SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2'
sending incremental file list
SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2
sent 69,041 bytes received 120,593 bytes 22,309.88 bytes/sec
total size is 296,353,792 speedup is 1,562.77
The latest jobs are also passing despite my change still being in place:
[2023-10-17T23:42:09.698321+02:00] [debug] [pid:33719] Syncing '/var/lib/openqa/share/factory/hdd/fixed/SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2' directly from worker host to unreal6.qe.nue2.suse.org
Could not create directory '/var/lib/empty/.ssh' (Permission denied).
Failed to add the host to the list of known hosts (/var/lib/empty/.ssh/known_hosts).
sending incremental file list
SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2
sent 69,051 bytes received 120,593 bytes 19,962.53 bytes/sec
I assume someone simply made sure there's again enough disk space.
Note that it makes no sense to associate all possible failures of this rsync command with this ticket which was about the initial problem of that rsync command when I first introduced it. I removed the auto review regex because no such errors should happen again (as the RSYNC_PASSWORD
part has been removed from the command entirely).
I did one round of host=openqa.suse.de ./openqa-query-for-job-label 137558
but that must be enough. Getting rid of all the wrongly taken over bugrefs would take quite long.
Updated by sebchlad about 1 year ago
- Target version changed from 640 to QE Kernel Done
Updated by syrianidou_sofia about 1 year ago
We still experience this issue. For the latest build we had two relevant failures for svirt-xen-pv:
https://openqa.suse.de/tests/12639802
https://openqa.suse.de/tests/12639883
Probably not related , but worths mentioning that all jobs fail in yast job group for the particular machine. Most just fail to get a vnc graphical connetion: https://openqa.suse.de/tests/12639884#step/setup_libyui/1