Project

General

Profile

Actions

action #137558

closed

[tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems

Added by rfan1 7 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
QE Kernel - QE Kernel Done
Start date:
2023-10-07
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

Many openQA jobs are failed at "rsync" disk images to s390x SUTs with error messages like:

"RSYNC_PASSWORD='xxxxxxx' rsync -av 'xxxxx.qcow2' '@s390_suts_xxxx.suse.de:/var/lib/libvirt/images/xxxxx.qcow2'" unexpectedly returned exit value 255 at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 566.

So can tools team take a look at this issue?


openQA test in scenario sle-15-SP6-Migration-from-SLE12-SPx-s390x-autoupgrade_sles12sp5_pscc_base_all_full_auto@s390x-kvm-sle12 fails in
bootloader_zkvm

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Fails since (at least) Build 26.1

Expected result

Last good: 20.1 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by rfan1 7 months ago

  • Description updated (diff)
Actions #3

Updated by okurz 7 months ago

  • Priority changed from Normal to Urgent
  • Target version set to Ready
Actions #5

Updated by okurz 7 months ago

Created revert PR https://github.com/os-autoinst/os-autoinst/pull/2380 due to significant regression, see
https://openqa.suse.de/tests/overview?modules=bootloader_zkvm&modules_result=failed&build=26.1&distri=sle&version=15-SP6&groupid=265#

Whoever is the next person after this revert is deployed please retrigger all according test failures.

Actions #6

Updated by tinita 7 months ago

  • Status changed from New to In Progress
  • Assignee set to tinita
Actions #7

Updated by tinita 7 months ago

  • Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to "rsync" image problems to [tools][s390x][zkvm]test fails in bootloader_zkvm due to "rsync" image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255"
Actions #8

Updated by tinita 7 months ago

So apparently the username is empty, which is also confirmed by the warning in the log:

Use of uninitialized value $c{"username"} in concatenation (.) or string at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 361.
Actions #9

Updated by okurz 7 months ago

  • Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to "rsync" image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255" to [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255"
Actions #10

Updated by tinita 7 months ago

  • Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255" to [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255":retry
Actions #11

Updated by tinita 7 months ago

  • Priority changed from Urgent to High

I restarted all failed with this command:

host=openqa.suse.de comment_query=' ' failed_since="'2023-10-06'"  ./openqa-monitor-investigation-candidates >candidates.137558
cat candidates.137558 | host=openqa.suse.de ./openqa-label-known-issues-multi 
Actions #13

Updated by livdywan 7 months ago

Apparently @mkittler came up with a better version of the original svirt backend change: https://github.com/os-autoinst/os-autoinst/pull/2381

Actions #15

Updated by tinita 7 months ago

  • Status changed from Feedback to Resolved

I found two tests from yesterday with this ticket referenced in the comment:

host=openqa.suse.de ./openqa-query-for-job-label 137558
12443491|2023-10-10 14:57:35|done|failed|slem_virtualization||worker31
12443486|2023-10-10 14:55:41|done|failed|slem_virtualization||worker31
...

Both were takeover comments. However, they failed both in a later step with a different error, so I deleted the comments.
https://openqa.suse.de/tests/12443486#comments
https://openqa.suse.de/tests/12443491#comments

So it seems to be fine now.

Actions #16

Updated by mdati 7 months ago

  • Status changed from Resolved to Feedback

Please, during last checks,after tests rerun retry attempt same issue re-occurring today, in tests:
sle-15-SP4-JeOS-for-kvm-and-xen-Updates-x86_64-Build20231015-1-jeos-kdump@svirt-xen-hvm

jeos-containers-podman@svirt-xen-pv
jeos-filesystem@svirt-xen-pv
jeos-fips@svirt-xen-pv
jeos-kdump@svirt-xen-hvm
jeos-kdump@svirt-xen-pv

https://openqa.suse.de/tests/12540540#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540538#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540539#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540553#step/bootloader_svirt/15
https://openqa.suse.de/tests/12540538#step/bootloader_svirt/15
Desc.:

# Test died: {
  "wantarray" => undef,
  "cmd" => "backend_proxy_console_call",
  "function" => "add_disk",
  "args" => [
              {
                "backingfile" => 1,
                "dev_id" => "a",
                "file" => "/var/lib/openqa/share/factory/hdd/fixed/SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2"
              }
            ],
  "json_cmd_token" => "lbavhMTS",
  "console" => "svirt"
}
"sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking=no' -av 'SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2' 'root@unreal6.qe.nue2.suse.org:/var/lib/libvirt/images//SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2'" unexpectedly returned exit value 11 at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 566.
    IPC::System::Simple::_check_exit("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"..., 11, ARRAY(0x56006f8f6ee0)) called at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 538
    IPC::System::Simple::_process_child_error(2816, "sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"..., ARRAY(0x56006f8f6ee0)) called at /usr/lib/perl5/vendor_perl/5.26.1/IPC/System/Simple.pm line 178
    IPC::System::Simple::run("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"...) called at (eval 388) line 12
    eval {...} called at (eval 388) line 11
    Fatal::__ANON__("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"...) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 355
    consoles::sshVirtsh::_system("sshpass -p '[HIDDEN]' rsync -e 'ssh -o StrictHostKeyChecking"...) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 361
    consoles::sshVirtsh::_copy_image_else(consoles::sshVirtsh=HASH(0x5600749d02c8), "/var/lib/openqa/share/factory/hdd/fixed/SLES15-SP4-Minimal-VM"..., "SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2", "/var/lib/libvirt/images/") called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 384
    consoles::sshVirtsh::_copy_image_to_vm_host(consoles::sshVirtsh=HASH(0x5600749d02c8), HASH(0x560073a8c2e0), "/vmfs/volumes//openQA/", "openQA-SUT-2a.img", "openQA-SUT-2", "/var/lib/libvirt/images/", undef) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 447
    consoles::sshVirtsh::add_disk(consoles::sshVirtsh=HASH(0x5600749d02c8), HASH(0x560073a8c2e0)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 820
    eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 815
    backend::baseclass::proxy_console_call(backend::svirt=HASH(0x5600733d7348), HASH(0x560073d35e48)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 79
    backend::baseclass::handle_command(backend::svirt=HASH(0x5600733d7348), HASH(0x56007299a028)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 616
    backend::baseclass::check_socket(backend::svirt=HASH(0x5600733d7348), IO::Handle=GLOB(0x5600732b7bc8), 0) called at /usr/lib/os-autoinst/backend/svirt.pm line 298
    backend::svirt::check_socket(backend::svirt=HASH(0x5600733d7348), IO::Handle=GLOB(0x5600732b7bc8), 0) called at /usr/lib/os-autoinst/backend/baseclass.pm line 284
    backend::baseclass::do_capture(backend::svirt=HASH(0x5600733d7348), undef, 1697456074.60292) called at /usr/lib/os-autoinst/backend/baseclass.pm line 311
    eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 311
    backend::baseclass::run_capture_loop(backend::svirt=HASH(0x5600733d7348)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 133
    backend::baseclass::run(backend::svirt=HASH(0x5600733d7348), 14, 17) called at /usr/lib/os-autoinst/backend/driver.pm line 68
    backend::driver::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x560074060ee0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x560074060ee0), CODE(0x560073e79a88)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x560074060ee0)) called at /usr/lib/os-autoinst/backend/driver.pm line 72
    backend::driver::start(backend::driver=HASH(0x560074c2b9a0)) called at /usr/lib/os-autoinst/backend/driver.pm line 37
    backend::driver::new("backend::driver", "svirt") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
    OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 100
    OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x56006c9321d8)) called at /usr/bin/isotovideo line 134
    eval {...} called at /usr/bin/isotovideo line 122
 at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 355
 at /usr/lib/os-autoinst/backend/console_proxy.pm line 46.
    backend::console_proxy::__ANON__(undef, HASH(0x56007407a6d0)) called at sle/tests/installation/bootloader_svirt.pm line 157
    bootloader_svirt::run(bootloader_svirt=HASH(0x560072c0d430)) called at /usr/lib/os-autoinst/basetest.pm line 352
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 346
    basetest::runtest(bootloader_svirt=HASH(0x560072c0d430)) called at /usr/lib/os-autoinst/autotest.pm line 400
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 400
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 257
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 257
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 308
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x5600740a1f50)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x5600740a1f50), CODE(0x5600743de2f0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x5600740a1f50)) called at /usr/lib/os-autoinst/autotest.pm line 310
    autotest::start_process() called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 94
    OpenQA::Isotovideo::Runner::start_autotest(OpenQA::Isotovideo::Runner=HASH(0x56006c9321d8)) called at /usr/bin/isotovideo line 133
    eval {...} called at /usr/bin/isotovideo line 122
Actions #17

Updated by tinita 7 months ago

  • Assignee changed from tinita to mkittler

@mkittler is preparing a partial revert

Actions #18

Updated by tinita 7 months ago

Note that the 5 linked jobs all have the following rsync output:

Could not create directory '/var/lib/empty/.ssh' (Permission denied).
Failed to add the host to the list of known hosts (/var/lib/empty/.ssh/known_hosts).
sending incremental file list
SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2
rsync: [receiver] write failed on "/var/lib/libvirt/images/SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(378) [receiver=3.2.3]
Actions #19

Updated by okurz 6 months ago

  • Target version changed from Ready to 640

As mkittler is working on that and I trust him to know what to do in the next step I am selecting "QE Kernel Current" which I assume is the correct way for kernel squad.

Actions #20

Updated by mkittler 6 months ago

I haven't been working on it except trying to help people in chat and fixing the initial problems mentioned in the ticket description. The issues in further comments are a different problem.

You can create a revert of my changes via https://github.com/Martchus/os-autoinst/pull/new/revert-svirt-asset-handling in case you want that. I don't think it'll help with the problem No space left on device (28), though. Or is there enough space and this is a false error?

Actions #21

Updated by mkittler 6 months ago

  • Subject changed from [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems auto_review:"Test died:[\s\S]*RSYNC_PASSWORD.*rsync -av.*unexpectedly returned exit value 255":retry to [tools][s390x][zkvm]test fails in bootloader_zkvm due to 'rsync' image problems
  • Status changed from Feedback to Resolved

Looks like there's enough space on that host:

unreal6:~ # df -h /var/lib/libvirt/images/
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/openqa_vg-openqa_lv  110G   50G   55G  48% /var/lib/libvirt/images

And copying also generally works:

martchus@openqa:/var/lib/openqa/share/factory/hdd/fixed> sshpass -p '…' rsync -e 'ssh -o StrictHostKeyChecking=no' -av 'SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2' 'root@unreal6.qe.nue2.suse.org:/var/lib/libvirt/images//SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2'
sending incremental file list
SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2

sent 69,041 bytes  received 120,593 bytes  22,309.88 bytes/sec
total size is 296,353,792  speedup is 1,562.77

The latest jobs are also passing despite my change still being in place:

[2023-10-17T23:42:09.698321+02:00] [debug] [pid:33719] Syncing '/var/lib/openqa/share/factory/hdd/fixed/SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2' directly from worker host to unreal6.qe.nue2.suse.org
Could not create directory '/var/lib/empty/.ssh' (Permission denied).
Failed to add the host to the list of known hosts (/var/lib/empty/.ssh/known_hosts).
sending incremental file list
SLES15-SP4-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2

sent 69,051 bytes  received 120,593 bytes  19,962.53 bytes/sec

I assume someone simply made sure there's again enough disk space.


Note that it makes no sense to associate all possible failures of this rsync command with this ticket which was about the initial problem of that rsync command when I first introduced it. I removed the auto review regex because no such errors should happen again (as the RSYNC_PASSWORD part has been removed from the command entirely).

I did one round of host=openqa.suse.de ./openqa-query-for-job-label 137558 but that must be enough. Getting rid of all the wrongly taken over bugrefs would take quite long.

Actions #22

Updated by sebchlad 6 months ago

  • Target version changed from 640 to QE Kernel Done
Actions #23

Updated by syrianidou_sofia 6 months ago

We still experience this issue. For the latest build we had two relevant failures for svirt-xen-pv:
https://openqa.suse.de/tests/12639802
https://openqa.suse.de/tests/12639883
Probably not related , but worths mentioning that all jobs fail in yast job group for the particular machine. Most just fail to get a vnc graphical connetion: https://openqa.suse.de/tests/12639884#step/setup_libyui/1

Actions

Also available in: Atom PDF