Project

General

Profile

action #109112

Updated by okurz 5 months ago

Test died: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host

## Observation
We have issues in multiple scenarios in first boot when connecting to PowerVM (and also we found some ipmi job).
The first test that tries to run `select_console('root-console');` fails.

In ppc64le PowerVM:
https://openqa.suse.de/tests/8418948#step/validate_lvm/1
https://openqa.suse.de/tests/8420902#step/system_prepare/1
https://openqa.suse.de/tests/8420907#step/validate_partition_table_via_blkid/1
https://openqa.suse.de/tests/8420908#step/validate_lvm/1
https://openqa.suse.de/tests/8420920#step/validate_partition_table_via_parted/1

From logs:
```
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":51899"
after 28647 requests (28647 known processed) with 0 events remaining.
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":51899"
[2022-03-28T13:45:12.281350+02:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 173481 and exit status: 1
[2022-03-28T13:45:12.282681+02:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 174616 and exit status: 84
[2022-03-28T13:45:12.282797+02:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 174619 and exit status: 0
[2022-03-28T13:45:12.461944+02:00] [debug] Connected to Xvnc - PID 177124
icewm PID is 177169
[2022-03-28T13:45:13.468637+02:00] [debug] Wait for SSH on host redcurrant-4.qa.suse.de (timeout: 120)
[2022-03-28T13:47:13.688450+02:00] [debug] redcurrant-4.qa.suse.de does not seems to have an active SSH server. Continuing anyway.
xterm PID is 178945
[2022-03-28T13:47:13.696027+02:00] [debug] <<< backend::baseclass::start_ssh_serial(username="root", password="SECRET", hostname="redcurrant-4.qa.suse.de")
[2022-03-28T13:47:13.696288+02:00] [debug] <<< backend::baseclass::new_ssh_connection(password="SECRET", hostname="redcurrant-4.qa.suse.de", username="root")
[2022-03-28T13:47:14.840534+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds...
[2022-03-28T13:47:27.960550+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds...
[2022-03-28T13:47:41.070671+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds...
[2022-03-28T13:47:54.190507+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds...
[2022-03-28T13:48:07.320520+02:00] [debug] Could not connect to root@redcurrant-4.qa.suse.de, Retrying after some seconds...
[2022-03-28T13:48:17.325260+02:00] [debug] post_fail_hook failed: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host at /usr/lib/os-autoinst/testapi.pm line 1759.

testapi::select_console("root-ssh") called at sle/lib/Utils/Backends.pm line 83
```

In x86_64 ipmi: https://openqa.suse.de/tests/8420870#step/system_prepare/1

## Acceptance criteria
* **AC1:** Significantly higher code coverage in https://app.codecov.io/gh/os-autoinst/os-autoinst/blob/master/consoles/sshXtermVt.pm
* **AC2:** The typo is gone, e.g. just everything removed :)

## Suggestions

* We accept the hypothesis that the jobs just failed due to lower level network issues #108845 which already received a fix meanwhile so nothing to do for the immediate root cause
* We can improve though:
* There is a typo to fix in the message "does not seems"
* Do not continue after ssh connect fails
* But be explicit about the root cause. The test finally aborts with "No route to host" so we should have access to that message. for example in https://github.com/os-autoinst/os-autoinst/blob/master/consoles/sshXtermVt.pm#L60 make sure that the error details (underlying error message in $! or $@) are used for a better error message
* Make sure that we have unit test coverage with some mocking for this behaviour

Back