action #69328
closed[o3][s390x] Early fail on s390x workers: connection refused
0%
Description
Observation¶
openQA test in scenario opensuse-Tumbleweed-DVD-s390x-textmode@s390x fails in
bootloader_s390
# Test died: expected command exit status ok, got error at /usr/lib/os-autoinst/consoles/s3270.pm line 102.
consoles::s3270::send_3270(consoles::s3270=HASH(0x564b2c90c3e8), "Connect(192.168.112.9)") called at /usr/lib/os-autoinst/consoles/s3270.pm line 375
consoles::s3270::_connect_3270(consoles::s3270=HASH(0x564b2c90c3e8), "192.168.112.9") called at /usr/lib/os-autoinst/consoles/s3270.pm line 438
consoles::s3270::connect_and_login(consoles::s3270=HASH(0x564b2c90c3e8)) called at /usr/lib/os-autoinst/consoles/s3270.pm line 506
consoles::s3270::activate(consoles::s3270=HASH(0x564b2c90c3e8)) called at /usr/lib/os-autoinst/consoles/console.pm line 97
consoles::console::select(consoles::s3270=HASH(0x564b2c90c3e8)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 612
backend::baseclass::try {...} () called at /usr/lib/perl5/vendor_perl/5.26.1/Try/Tiny.pm line 100
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Try/Tiny.pm line 93
Try::Tiny::try(CODE(0x564b2b529808), Try::Tiny::Catch=REF(0x564b2bc1a8a0)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 616
backend::baseclass::select_console(backend::s390x=HASH(0x564b2b9c3998), HASH(0x564b2c4dec60)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 89
backend::baseclass::handle_command(backend::s390x=HASH(0x564b2b9c3998), HASH(0x564b2ca34000)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 570
backend::baseclass::check_socket(backend::s390x=HASH(0x564b2b9c3998), IO::Handle=GLOB(0x564b2b8f8bf0), 0) called at /usr/lib/os-autoinst/backend/s390x.pm line 69
backend::s390x::check_socket(backend::s390x=HASH(0x564b2b9c3998), IO::Handle=GLOB(0x564b2b8f8bf0), 0) called at /usr/lib/os-autoinst/backend/baseclass.pm line 276
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 191
backend::baseclass::run_capture_loop(backend::s390x=HASH(0x564b2b9c3998)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 146
backend::baseclass::run(backend::s390x=HASH(0x564b2b9c3998), 13, 16) called at /usr/lib/os-autoinst/backend/driver.pm line 88
backend::driver::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x564b2c846f38)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x564b2c846f38), CODE(0x564b2694a508)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x564b2c846f38)) called at /usr/lib/os-autoinst/backend/driver.pm line 90
backend::driver::start(backend::driver=HASH(0x564b2b806dd0)) called at /usr/lib/os-autoinst/backend/driver.pm line 54
backend::driver::new("backend::driver", "s390x") called at /usr/bin/isotovideo line 233
main::init_backend() called at /usr/bin/isotovideo line 284
This looks very similar to an issue from 2017 - https://progress.opensuse.org/issues/25662
The community that is interested in s390x Tumbleweed is curretnly completely blocked on this issue, as no indication about the actual quality of the distro can be given.
See also https://lists.opensuse.org/opensuse-project/2020-07/msg00127.html
Test suite description¶
Maintainer: okurz
Installation in textmode and selecting the textmode "desktop" during installation.
Reproducible¶
Fails since (at least) Build 20200428
Expected result¶
Last good: 20200425 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by SLindoMansilla over 4 years ago
- Subject changed from Early fail on s390x workers: connection refused to [o3][s390x] Early fail on s390x workers: connection refused
Updated by SLindoMansilla over 4 years ago
I talked to Mike Friesenegger and found out that the old linux128 was moved from zvm63b.openqanet.opensuse.org (192.168.112.9) to s390zlp1.suse.de (10.161.159.101)
(Notice different networks)
We believe that zlp1 does not have a vswitch attached to the 192.168.112.0 network.
We will need to confirm with Ihno and Gerhard. If they agree then changes may need to be made to s390zlp1 or linux128 guest.
FYI: Berthold set up the old machine network.
Updated by azouhr about 4 years ago
So, we did not have a single openQA run in more than half a year. Could you please come up with a solution NOW? The whole distribution seems to deteriorate right now, and there is nothing I can do about it without openQA.
This should not take years or weeks, but maybe days or hours.
Updated by SLindoMansilla about 4 years ago
azouhr wrote:
So, we did not have a single openQA run in more than half a year. Could you please come up with a solution NOW? The whole distribution seems to deteriorate right now, and there is nothing I can do about it without openQA.
This should not take years or weeks, but maybe days or hours.
Hi azouhr,
I am really sorry, but there is nothing I can do until Ihno set up the machines. And Ihno seems to not look into this ticket. I have asked him so many times that I got bored. Maybe we can ask Mike to raise it to Ihno? again? I will send again another email... and add you as CC
Updated by AdaLovelace about 4 years ago
I have been working for IBM since October and have received the approval for openSUSE contributions at the end of the month. This week we had a small discussion about how to proceed.
IBM is watching that as a win-win situation to receive an openSUSE Member as an employee. We want to improve the cooperation in this way.
A non-working mainframe does not affect only openSUSE. You can not test SLES on Z, too.
We are testing all developed products for Z by us (for all distribution partners). Do you want to improve the partnership, too?
Then you should have a working mainframe.
Updated by okurz about 4 years ago
- Due date set to 2020-11-13
- Status changed from New to In Progress
- Assignee changed from mgriessmeier to okurz
- Target version set to Ready
I participated in an "ad-hoc openSUSE s390x meeting" invited for by lkocman, see minutes in
https://lists.opensuse.org/opensuse-factory/2020-11/msg00067.html
@lkocman thanks for taking care. This also triggered AdaLovelace's reaction ;)
Ihno (SUSE) looked into the problem on the side of SUSE mainframe VM network configuration. I provided him "operator" permissions on o3 so he could retrigger the openQA test scenario https://openqa.opensuse.org/tests/latest?arch=s390x&distri=opensuse&flavor=DVD&machine=s390x&test=textmode&version=Tumbleweed and we have some progress as visible in https://openqa.opensuse.org/tests/1464182#step/bootloader_s390/1 . https://openqa.opensuse.org/tests/1464182#step/bootloader_s390/29 shows that we have entered some necessary details in x3270 to end up on a repo URL selection screen.
Ihno asked me to configure "LINUX144, LINUX145, LINUX146, LINUX147" instead of LINUX128.
I have asked him in clarification if I should enable all of these and remove linux128. For now I have just added the additional hosts and not removed linux128 yet. The corresponding content of rebel:/etc/openqa/workers.ini is:
[1]
WORKER_CLASS = s390x,s390x-rebel-1-linux128
BACKEND=s390x
S390_HOST=128
ZVM_GUEST=linux128
ZVM_PASSWORD=lin390
ZVM_HOST=192.168.112.9
[2]
WORKER_CLASS = s390x,s390x-rebel-1-linux144
BACKEND=s390x
S390_HOST=144
ZVM_GUEST=linux144
ZVM_PASSWORD=lin390
ZVM_HOST=192.168.112.9
[3]
WORKER_CLASS = s390x,s390x-rebel-1-linux145
BACKEND=s390x
S390_HOST=145
ZVM_GUEST=linux145
ZVM_PASSWORD=lin390
ZVM_HOST=192.168.112.9
[4]
WORKER_CLASS = s390x,s390x-rebel-1-linux146
BACKEND=s390x
S390_HOST=146
ZVM_GUEST=linux146
ZVM_PASSWORD=lin390
ZVM_HOST=192.168.112.9
[5]
WORKER_CLASS = s390x,s390x-rebel-1-linux147
BACKEND=s390x
S390_HOST=147
ZVM_GUEST=linux147
ZVM_PASSWORD=lin390
ZVM_HOST=192.168.112.9
Called on rebel:
systemctl restart openqa-worker@1
systemctl enable --now openqa-worker@{2..5}
and then from my machine:
cnt=0; for i in 128 144 145 146 147 ; do cnt=$((cnt+1)); openqa-clone-job --within-instance https://openqa.opensuse.org 1464182 WORKER_CLASS=s390x-rebel-$cnt-linux$i; done
resulting in
Created job #1464467: opensuse-Tumbleweed-DVD-s390x-Build20201106-textmode@s390x -> https://openqa.opensuse.org/t1464467
Created job #1464468: opensuse-Tumbleweed-DVD-s390x-Build20201106-textmode@s390x -> https://openqa.opensuse.org/t1464468
Created job #1464469: opensuse-Tumbleweed-DVD-s390x-Build20201106-textmode@s390x -> https://openqa.opensuse.org/t1464469
Created job #1464470: opensuse-Tumbleweed-DVD-s390x-Build20201106-textmode@s390x -> https://openqa.opensuse.org/t1464470
Created job #1464471: opensuse-Tumbleweed-DVD-s390x-Build20201106-textmode@s390x -> https://openqa.opensuse.org/t1464471
All jobs seem to reach at least the same state. After clarification with Ihno (SUSE) I have removed the configuration for linux128 now and adjusted rebel:/etc/openqa/workers.ini accordingly to have four instances, linux144, linux145, linux146, linux147. I will document accordingly in our internal openQA infrastructure document https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls as well.
Updated by SLindoMansilla about 4 years ago
AdaLovelace wrote:
I have been working for IBM since October and have received the approval for openSUSE contributions at the end of the month. This week we had a small discussion about how to proceed.
IBM is watching that as a win-win situation to receive an openSUSE Member as an employee. We want to improve the cooperation in this way.A non-working mainframe does not affect only openSUSE. You can not test SLES on Z, too.
We are testing all developed products for Z by us (for all distribution partners). Do you want to improve the partnership, too?
Then you should have a working mainframe.
Hi, just to clarify, the setup for openSUSE tests and SLE tests are separated. Tests for SLE were working, it was only openSUSE setup which was not working.
Updated by okurz about 4 years ago
- Status changed from In Progress to Resolved
Meanwhile late yesterday evening tests have progress further, best example https://openqa.opensuse.org/tests/1464522#step/await_install/68 which is clearly a product bug that should be reported as such. I had a 1.5h session with "AdaLovelace", Sarah from IBM. I gave her a crash course in "openSUSE s390x release management – the review part". She will keep up with this topic so hopefully we see better state of s390x in the near future.
I have updated the internal reference for the o3 reserved z/VM guests in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/271 , merged. with this I regard this specific ticket as "Resolved". For any other problems in the s390x test scenarios we should have specific specific tickets. E.g. currently I see the problem that tests incomplete being unable to resolve the test variable "WORKER_CLASS".
Updated by SLindoMansilla about 4 years ago
okurz wrote:
Meanwhile late yesterday evening tests have progress further, best example https://openqa.opensuse.org/tests/1464522#step/await_install/68 which is clearly a product bug that should be reported as such.
Updated by SLindoMansilla about 4 years ago
- Related to action #77116: test fails in bootloader_s390 - ftp installation media directory repo is too long for using in parmfile - linux144, linux145, linux146, linux147 (rebel) added
Updated by okurz about 4 years ago
- Related to action #77209: workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service added