Project

General

Profile

action #109112

Updated by okurz about 2 years ago

Test died: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host 

 ## Observation 
 We have issues in multiple scenarios in first boot when connecting to PowerVM (and also we found some ipmi job). 
 The first test that tries to run `select_console('root-console');` fails. 

 In ppc64le PowerVM: 
 https://openqa.suse.de/tests/8418948#step/validate_lvm/1 
 https://openqa.suse.de/tests/8420902#step/system_prepare/1 
 https://openqa.suse.de/tests/8420907#step/validate_partition_table_via_blkid/1 
 https://openqa.suse.de/tests/8420908#step/validate_lvm/1 
 https://openqa.suse.de/tests/8420920#step/validate_partition_table_via_parted/1 

 From logs: 
 ``` 
 XIO:    fatal IO error 11 (Resource temporarily unavailable) on X server ":51899" 
       after 28647 requests (28647 known processed) with 0 events remaining. 
 xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":51899" 
 [2022-03-28T13:45:12.281350+02:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 173481 and exit status: 1 
 [2022-03-28T13:45:12.282681+02:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 174616 and exit status: 84 
 [2022-03-28T13:45:12.282797+02:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 174619 and exit status: 0 
 [2022-03-28T13:45:12.461944+02:00] [debug] Connected to Xvnc - PID 177124 
 icewm PID is 177169 
 [2022-03-28T13:45:13.468637+02:00] [debug] Wait for SSH on host redcurrant-4.qa.suse.de (timeout: 120) 
 [2022-03-28T13:47:13.688450+02:00] [debug] redcurrant-4.qa.suse.de does not seems to have an active SSH server. Continuing anyway. 
 xterm PID is 178945 
 [2022-03-28T13:47:13.696027+02:00] [debug] <<< backend::baseclass::start_ssh_serial(username="root", password="SECRET", hostname="redcurrant-4.qa.suse.de") 
 [2022-03-28T13:47:13.696288+02:00] [debug] <<< backend::baseclass::new_ssh_connection(password="SECRET", hostname="redcurrant-4.qa.suse.de", username="root") 
 [2022-03-28T13:47:14.840534+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds... 
 [2022-03-28T13:47:27.960550+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds... 
 [2022-03-28T13:47:41.070671+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds... 
 [2022-03-28T13:47:54.190507+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds... 
 [2022-03-28T13:48:07.320520+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds... 
 [2022-03-28T13:48:17.325260+02:00] [debug] post_fail_hook failed: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host at /usr/lib/os-autoinst/testapi.pm line 1759. 
       
  	 testapi::select_console("root-ssh") called at sle/lib/Utils/Backends.pm line 83 
 ``` 

 In x86_64 ipmi: https://openqa.suse.de/tests/8420870#step/system_prepare/1  

 ## Acceptance criteria 
 * **AC1:** Significantly higher code coverage in https://app.codecov.io/gh/os-autoinst/os-autoinst/blob/master/consoles/sshXtermVt.pm 
 * **AC2:** The typo is gone, e.g. just everything removed :) 

 ## Suggestions 

 * We accept the hypothesis that the jobs just failed due to lower level network issues #108845 which already received a fix meanwhile so nothing to do for the immediate root cause 
 * We can improve though: 
     * There is a typo to fix in the message "does not seems" 
     * Do not continue after ssh connect fails 
     * But be explicit about the root cause. The test finally aborts with "No route to host" so we should have access to that message. for example in https://github.com/os-autoinst/os-autoinst/blob/master/consoles/sshXtermVt.pm#L60 make sure that the error details (underlying error message in $! or $@) are used for a better error message 
     * Make sure that we have unit test coverage with some mocking for this behaviour

Back