Project

General

Profile

Actions

action #178324

closed

coordination #176337: [saga][epic] Stable os-autoinst backends with stable command execution (no mistyping)

coordination #125708: [epic] Future ideas for more stable non-qemu backends

[sporadic] svirt s390x tests sometimes time out while syncing assets auto_review:"LIBSSH2_ERROR_TIMEOUT[\s\S]*rsync":retry size:S

Added by mkittler 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-02-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

After #176868 was resolved the timeout for syncing assets is enforced correctly when it expires:

[2025-03-04T01:57:33.064193Z] [debug] [pid:34725] Using existing SSH connection (key:hostname=s390zl12.oqa.prg2.suse.org,username=root,port=22)
[2025-03-04T02:12:35.031481Z] [debug] [pid:34725] [run_ssh_cmd(rsync --timeout='150' --stats -av '/var/lib/openqa/share/factory/hdd/sle-15-SP3-s390x-5.3.18-150300.268.1.gd2bdf5f-Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-with-ltp.qcow2' '/var/lib/libvirt/images//sle-15-SP3-s390x-5.3.18-150300.268.1.gd2bdf5f-Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-with-ltp.qcow2')] stdout:
  sending incremental file list
  sle-15-SP3-s390x-5.3.18-150300.268.1.gd2bdf5f-Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-with-ltp.qcow2
…
  Time out waiting for data (-9 LIBSSH2_ERROR_TIMEOUT) at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/Net/SSH2.pm line 51.
    Net::SSH2::die_with_error(Net::SSH2=SCALAR(0x562b7a3cd8c8)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 1328
    backend::baseclass::run_ssh_cmd(backend::svirt=HASH(0x562b7a84d618), "rsync --timeout='150' --stats -av '/var/lib/openqa/share/fact"..., "username", "root", "hostname", "s390zl12.oqa.prg2.suse.org", "password", "Nots3cr3t-\@3-vt", ...) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 674
    consoles::sshVirtsh::run_cmd(consoles::sshVirtsh=HASH(0x562b79c875c8), "rsync --timeout='150' --stats -av '/var/lib/openqa/share/fact"..., "timeout", 900) called at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 396

The underlying problem that this rsync command can take very long (or even gets stuck) hasn't been resolved, though. The problem affected multiple jobs (see #176868#note-17) but more recent jobs look good again.

We need to figure out whether the download is really that slow or whether the SSH connection is for some reason going stale. In the last case it would perhaps help to retry the download with a fresh connection.

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#178324

Suggestions


Related issues 3 (1 open2 closed)

Related to openQA Project (public) - action #179077: [qe-core]test fails in bootloader_start on s390x. speed of downloading initrd file is slow Closedmkittler2025-03-18

Actions
Copied from openQA Project (public) - action #176868: Random SSH failures on s390x workers size:SResolvedmkittler2025-02-10

Actions
Copied to openQA Project (public) - action #179404: [sporadic] svirt s390x tests sometimes time out while syncing assets auto_review:"LIBSSH2_ERROR_TIMEOUT[\s\S]*rsync":retryNew2025-02-10

Actions
Actions

Also available in: Atom PDF