Actions
action #99123
closedssh based backends can run into timeout if ssh connection is stuck
Description
Observation¶
From https://suse.slack.com/archives/C02CANHLANP/p1632408138493900
There are also a lot of jobs failed on bootloader for PowerPC: https://openqa.suse.de/tests/7200264#step/bootloader_start/3
this job ran into the default openQA 2h timeout. Excerpt from log:
[2021-09-23T05:35:13.884 CEST] [debug] <<< backend::baseclass::run_ssh(cmd="! lssyscfg -m redcurrant -r lpar --filter 'lpar_ids=8' -F state | grep -i 'not activated' -q", password="SECRET", username="hscroot", wantarray=0, keep_open=0, hostname="powerhmc1.arch.suse.de")
[2021-09-23T05:35:13.885 CEST] [debug] <<< backend::baseclass::new_ssh_connection(wantarray=0, hostname="powerhmc1.arch.suse.de", keep_open=0, blocking=1, password="SECRET", username="hscroot")
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":39057"
after 39145 requests (39145 known processed) with 0 events remaining.
[2021-09-23T07:28:47.185 CEST] [debug] backend got TERM
so something did not properly timeout within 2h, could it be the lssyscfg command?
Suggestions¶
I suggest to improve the ssh command to not be stuck for 2h but timeout after a reasonable time. That would be a start
Actions