action #99345
Updated by livdywan over 2 years ago
## Observation openQA test in scenario sle-12-SP4-Server-DVD-Updates-s390x-mru-install-minimal-with-addons@s390x-kvm-sle12 incomplete, stops at [start_install](https://openqa.suse.de/tests/7240293/modules/start_install/steps/6) ## Test suite description Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. ## Reproducible Fails since (at least) Build [20210927-1](https://openqa.suse.de/tests/7240293) (current job) Find jobs referencing this ticket with the help of https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , `openqa-query-for-job-label poo#99345` ## Expected result Last good: [20210925-1](https://openqa.suse.de/tests/7228955) (or more recent) ## Acceptance criteria * **AC1:** The root cause of the problem is known * **AC2:** The next steps are known and have been initiated ## Suggestions * Talk to all the people involved to get the full story ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Updates&machine=s390x-kvm-sle12&test=mru-install-minimal-with-addons&version=12-SP4) IPMI and s390x workers keep losing VNC connection during SLES installation and the reconnect attempt gets stuck for some strange reason until the job hits MAX_JOB_TIME: ``` [2022-05-11T13:46:24.602260+02:00] [debug] <<< testapi::wait_screen_change(timeout=10, similarity_level=50) XIO: fatal IO error 4 (Interrupted system call) on X server ":37191" after 23426 requests (23426 known processed) with 0 events remaining. XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":34867" after 28852 requests (28852 known processed) with 0 events remaining. [2022-05-11T15:36:53.708309+02:00] [debug] autotest received signal TERM, saving results of current test before exiting [2022-05-11T15:36:53.708518+02:00] [debug] isotovideo received signal TERM [2022-05-11T15:36:53.708516+02:00] [debug] backend got TERM ``` Note that the job spent 110 minutes in wait_screen_change() that was supposed to time out after 10 seconds. In another job it was stuck on `assert_screen`: ``` [2022-05-11T15:58:59.169408+02:00] [debug] <<< testapi::assert_screen(mustmatch="installation", no_wait=1, timeout=30)[2022-05-11T16:15:16.486959+02:00] [warn] !!! consoles::VNC::catch {...} : Error in VNC protocol - relogin: short read for zrle data 659 - 950[2022-05-11T21:47:05.966962+02:00] [debug] backend got TERM[2022-05-11T21:47:05.966980+02:00] [debug] isotovideo received signal TERM[2022-05-11T21:47:05.967084+02:00] [debug] autotest received signal TERM, saving results of current test before exiting XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":44341" after 28689 requests (28689 known processed) with 0 events remaining. XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":60785" after 39307 requests (39307 known processed) with 0 events remaining. ``` Here we've even got a VNC error and the VNC client would try to re-login but I suppose it is pointless because the VNC server terminates when the connection is lost anyways. So for a real retry we likely needed to also restart the VNC server. (Note that @MDoucha tried to connect manually here, see https://progress.opensuse.org/issues/99345#note-9)