Project

General

Profile

action #137384

Updated by okurz 7 months ago

## Observation 

 I think there are two variants of same issue. 
 Have seen it failing (only) on imagetester.qe.nue2.suse.org 
 bootloader_start is much worse because it is not failing but timing out at MAX_JOB_TIME 
 [bootloader_start](https://openqa.suse.de/tests/12369985) 
 [bootloader_zkvm](https://openqa.suse.de/tests/12369978/modules/bootloader_zkvm/steps/3) 

 fails with 
 `Reason: backend done: Error connecting to <root@s390zl14.suse.de>: Connection timed out` 

 ## Reproducible 

 Fails since (at least) Build [20231003-1](https://openqa.suse.de/tests/12369978) (current job) 

 Find jobs referencing this ticket with the help of 
 https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , 
 `openqa-query-for-job-label poo#137384` 


 ## Expected result 

 Last good: [20231002-1](https://openqa.suse.de/tests/12363002) (or more recent) 

 


 ## Acceptance Criteria 
 * **AC1**: It is known if s390zl14.suse.de is usable as a production worker 

 ## Suggestions 
 * As the error states an obvious "connection timed out" to s390zl14.suse.de check if that machine is generally reachable, e.g. `sudo salt \* cmd.run 'ping -c1 s390zl14.suse.de'` 
   * `sudo salt --no-color --out txt '*' cmd.run 'ping -c1 s390zl14.suse.de'` says unreachable or 100% packet loss so it's not specific to imagetester 
 * Also check our monitoring which includes a ping check to various important hosts. see if s390zl14 is there - we did not see an alert 
 * If the problem persists consider disabling the production use of the worker instance in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls , e.g. `s/390-kvm/&-poo137384/` 
 * Maybe s390zl14 is not expected to be usable in general, crosscheck git history of https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls, ask the ones that introduced it or who maintain the machine or according tests 
   * 6d0f181b60681956847458369f92f636a7449e4b "Remove s390zl14 as required external host from monitoring" and following discussion from there it seems it is expected that this mainframe is not usable for us  
   * https://racktables.nue.suse.com/?page=search&last_page=index&last_tab=default&q=s390zl14 says "nothing found" so ask around what that machine should be 
 * Maybe the machine is not supposed to be working, then remove the according openQA worker instance and ensure that someone takes care that s390zl14 is properly used outside the context of OSD so that no hardware is uselessly just wasting power and destroying our nice earth 
 * If the machine is supposed to be working as OSD worker target then create an according Eng-Infra ticket 

 ## Out of scope 
 * test code improvement, see #137387 
 * Looking into fixing the machine and improving the setup 

 ## Further details 

 Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Server-DVD-Updates&machine=s390x-kvm&test=docker_tests&version=15-SP3) 


 

Back