Actions
action #137384
closed[tools][s390x] worker imagetester can't reach SUT auto_review:"backend done: Error connecting to <root@s390zl14.suse.de>: Connection timed out" size:M
Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
Start date:
2023-10-04
Due date:
% Done:
0%
Estimated time:
Difficulty:
Tags:
Description
Observation¶
I think there are two variants of same issue.
Have seen it failing (only) on imagetester.qe.nue2.suse.org
bootloader_start is much worse because it is not failing but timing out at MAX_JOB_TIME
bootloader_start
bootloader_zkvm
fails with
Reason: backend done: Error connecting to <root@s390zl14.suse.de>: Connection timed out
Reproducible¶
Fails since (at least) Build 20231003-1 (current job)
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
openqa-query-for-job-label poo#137384
Expected result¶
Last good: 20231002-1 (or more recent)
Acceptance Criteria¶
- AC1: It is known if s390zl14.suse.de is usable as a production worker
Suggestions¶
- As the error states an obvious "connection timed out" to s390zl14.suse.de check if that machine is generally reachable, e.g.
sudo salt \* cmd.run 'ping -c1 s390zl14.suse.de'
sudo salt --no-color --out txt '*' cmd.run 'ping -c1 s390zl14.suse.de'
says unreachable or 100% packet loss so it's not specific to imagetester
- Also check our monitoring which includes a ping check to various important hosts. see if s390zl14 is there - we did not see an alert
- If the problem persists consider disabling the production use of the worker instance in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls , e.g.
s/390-kvm/&-poo137384/
- Maybe s390zl14 is not expected to be usable in general, crosscheck git history of https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls, ask the ones that introduced it or who maintain the machine or according tests
- 6d0f181b60681956847458369f92f636a7449e4b "Remove s390zl14 as required external host from monitoring" and following discussion from there it seems it is expected that this mainframe is not usable for us
- https://racktables.nue.suse.com/?page=search&last_page=index&last_tab=default&q=s390zl14 says "nothing found" so ask around what that machine should be
- Maybe the machine is not supposed to be working, then remove the according openQA worker instance and ensure that someone takes care that s390zl14 is properly used outside the context of OSD so that no hardware is uselessly just wasting power and destroying our nice earth
- If the machine is supposed to be working as OSD worker target then create an according Eng-Infra ticket
Out of scope¶
- test code improvement, see #137387
- Looking into fixing the machine and improving the setup
Further details¶
Always latest result in this scenario: latest
Actions