Project

General

Profile

action #99345

Updated by livdywan almost 2 years ago

## Observation 

 openQA test in scenario sle-12-SP4-Server-DVD-Updates-s390x-mru-install-minimal-with-addons@s390x-kvm-sle12 incomplete, stops at 
 [start_install](https://openqa.suse.de/tests/7240293/modules/start_install/steps/6) 

 ## Test suite description 
 Testsuite maintained at https://gitlab.suse.de/qa-maintenance/qam-openqa-yml. 


 ## Reproducible 

 Fails since (at least) Build [20210927-1](https://openqa.suse.de/tests/7240293) (current job) 

 Find jobs referencing this ticket with the help of 
 https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , 
 `openqa-query-for-job-label poo#99345` 


 ## Expected result 

 Last good: [20210925-1](https://openqa.suse.de/tests/7228955) (or more recent) 

 


 ## Acceptance criteria 
 * **AC1:** The root cause of the problem is known 
 * **AC2:** The next steps are known and have been initiated 

 ## Suggestions 
 * Talk to all the people involved to get the full story 

 ## Further details 

 Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Updates&machine=s390x-kvm-sle12&test=mru-install-minimal-with-addons&version=12-SP4) 

 IPMI and s390x workers keep losing VNC connection during SLES installation and the reconnect attempt gets stuck for some strange reason until the job hits MAX_JOB_TIME: 

 ``` 
 [2022-05-11T13:46:24.602260+02:00] [debug] <<< testapi::wait_screen_change(timeout=10, similarity_level=50) 
 XIO:    fatal IO error 4 (Interrupted system call) on X server ":37191" 
       after 23426 requests (23426 known processed) with 0 events remaining. 
 XIO:    fatal IO error 11 (Resource temporarily unavailable) on X server ":34867" 
       after 28852 requests (28852 known processed) with 0 events remaining. 
 [2022-05-11T15:36:53.708309+02:00] [debug] autotest received signal TERM, saving results of current test before exiting 
 [2022-05-11T15:36:53.708518+02:00] [debug] isotovideo received signal TERM 
 [2022-05-11T15:36:53.708516+02:00] [debug] backend got TERM 
 ``` 

 Note that the job spent 110 minutes in wait_screen_change() that was supposed to time out after 10 seconds. 

 In another job it was stuck on `assert_screen`: 

 ``` 
 [2022-05-11T15:58:59.169408+02:00] [debug] <<< testapi::assert_screen(mustmatch="installation", no_wait=1, timeout=30)[2022-05-11T16:15:16.486959+02:00] [warn] !!! consoles::VNC::catch {...} : Error in VNC protocol - relogin: short read for zrle data 659 - 950[2022-05-11T21:47:05.966962+02:00] [debug] backend got TERM[2022-05-11T21:47:05.966980+02:00] [debug] isotovideo received signal TERM[2022-05-11T21:47:05.967084+02:00] [debug] autotest received signal TERM, saving results of current test before exiting 
 XIO:    fatal IO error 11 (Resource temporarily unavailable) on X server ":44341" 
       after 28689 requests (28689 known processed) with 0 events remaining. 
 XIO:    fatal IO error 11 (Resource temporarily unavailable) on X server ":60785" 
       after 39307 requests (39307 known processed) with 0 events remaining. 
 ``` 
 Here we've even got a VNC error and the VNC client would try to re-login but I suppose it is pointless because the VNC server terminates when the connection is lost anyways. So for a real retry we likely needed to also restart the VNC server. (Note that @MDoucha tried to connect manually here, see https://progress.opensuse.org/issues/99345#note-9)

Back