Project

General

Profile

coordination #99030

Updated by livdywan over 2 years ago

## Observation 

 In recent test runs we are seeing the following error multiple times in a row:  

 ~~~ 
 backend died: Lost SSH connection to SUT: Failure while draining incoming flow at /usr/lib/os-autoinst/consoles/ssh_screen.pm line 89. 
 ~~~ 

 See http://duck-norris.qam.suse.de/tests/7274 (and the cloned jobs within). Also other jobs are affected e.g. https://openqa.suse.de/tests/7171999, http://openqa.qam.suse.cz/tests/28072 

 No workaround is possible and this is a major blocker, as part of the bare-metal test runs are not able to complete. 

 ## Steps to reproduce 

 Run a bare-metal test run on conan.qam.suse.de (high chance of failure) or on openqa.qam.suse.cz (lower chance of failure). 

 ## Acceptance criteria 
 - **AC1**: Bare metal tests don't die 

 ## Problem 

 Hypothesis: ssh connection drops. 

 ## Suggestion 

 * Fix the network 
 * Provide a way for the backend to reconnect a dropped ssh connection 
 * Use [auto-review](https://github.com/os-autoinst/scripts/blob/master/README.md#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger) with automatic retriggering, which avoids manual intervention of the same action and gives us more data about affected jobs/machines/architectures 
 * Rollback package updates, i.e. older rpms that have we stored in local caches or btrfs snapshots of the root fs. 

 ## Workaround 

 * None possible, this has major impact on our virtualization test runs and (probably) other bare-metal test runs as well.

Back