action #153057
closed[tools] test fails in bootloader_start because openQA can not boot for s390x size:M
0%
Description
Observation¶
We have got a successful build of openSUSE Tumbleweed for s390x during the Christmas time.
openQA is failing in the first steps during the preparation of the bootloader and choosing the s320 console:
Test died: expected command exit status ok, got error at /usr/lib/os-autoinst/consoles/s3270.pm line 75, <$fh> line 22.¶
consoles::s3270::send_3270(consoles::s3270=HASH(0x556b400f2428), "Connect(s390zl11.openqanet.opensuse.org)") called at /usr/lib/os-autoinst/consoles/s3270.pm line 315
consoles::s3270::_connect_3270(consoles::s3270=HASH(0x556b400f2428), "s390zl11.openqanet.opensuse.org") called at /usr/lib/os-autoinst/consoles/s3270.pm line 360
consoles::s3270::connect_and_login(consoles::s3270=HASH(0x556b400f2428)) called at /usr/lib/os-autoinst/consoles/s3270.pm line 424
consoles::s3270::activate(consoles::s3270=HASH(0x556b400f2428)) called at /usr/lib/os-autoinst/consoles/console.pm line 55
consoles::console::select(consoles::s3270=HASH(0x556b400f2428)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 660
backend::baseclass::try {...} () called at /usr/lib/perl5/vendor_perl/5.26.1/Try/Tiny.pm line 100
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Try/Tiny.pm line 93
Try::Tiny::try(CODE(0x556b4000b430), Try::Tiny::Catch=REF(0x556b408595e8)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 664
backend::baseclass::select_console(backend::s390x=HASH(0x556b40393d78), HASH(0x556b3fdf2ea8)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 79
backend::baseclass::handle_command(backend::s390x=HASH(0x556b40393d78), HASH(0x556b4089ff50)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 616
backend::baseclass::check_socket(backend::s390x=HASH(0x556b40393d78), IO::Handle=GLOB(0x556b40b26c28), 0) called at /usr/lib/os-autoinst/backend/s390x.pm line 41
backend::s390x::check_socket(backend::s390x=HASH(0x556b40393d78), IO::Handle=GLOB(0x556b40b26c28), 0) called at /usr/lib/os-autoinst/backend/baseclass.pm line 284
backend::baseclass::do_capture(backend::s390x=HASH(0x556b40393d78), undef, 1704270564.33364) called at /usr/lib/os-autoinst/backend/baseclass.pm line 311
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 311
backend::baseclass::run_capture_loop(backend::s390x=HASH(0x556b40393d78)) called at /usr/lib/os-autoinst/backend/baseclass.pm line 133
backend::baseclass::run(backend::s390x=HASH(0x556b40393d78), 14, 17) called at /usr/lib/os-autoinst/backend/driver.pm line 68
backend::driver::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x556b408938c8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 329
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x556b408938c8), CODE(0x556b3f715d50)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x556b408938c8)) called at /usr/lib/os-autoinst/backend/driver.pm line 72
backend::driver::start(backend::driver=HASH(0x556b40a88648)) called at /usr/lib/os-autoinst/backend/driver.pm line 37
backend::driver::new("backend::driver", "s390x") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 100
OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x556b3b1b8a38)) called at /usr/bin/isotovideo line 134
openQA test in scenario opensuse-Tumbleweed-DVD-s390x-autoyast_zvm@s390x-zVM-vswitch-l2 fails in
bootloader_start
Has been there any openQA changes during the Christmas time, what can have an effect on the boot process for s390x?
Test suite description¶
Create HDD for s390x textmode
Reproducible¶
Fails since (at least) Build 20231228
Expected result¶
Last good: 20231115 (or more recent)
Suggestions¶
- See related ticket #137408 about the recent o3 setup
- Confirm what s390zl11.openqanet.opensuse.org is and where it should be reachable from
- Check the worker config
- Lookup ipmi config? There is no ipmi
- Check if there's an entry in pillars / Add a new entry
- Verify if this is a regression or a product issue
- Optional: Try to login to the machine manually with x3270, same as openQA tests do
- Read https://progress.opensuse.org/projects/openqav3/wiki/#o3-s390-workers about the setup
- Ask Oliver for details about the worker, since he might have been the last to work on it (if he still remembers), and mgriessmeier and nicksinger
- Ask Ada
Further details¶
Always latest result in this scenario: latest
Updated by AdaLovelace 11 months ago
- Category changed from Bugs in existing tests to Infrastructure
Updated by livdywan 11 months ago
- Priority changed from Normal to High
- Target version set to Ready
All of the investigation jobs are failing and it's not sporadic. So I'm guessing it's either a regression in os-autoinst or a change in our production infrastructure - this would have to have been introduced between 2023-12-23 and 2024-01-01.
Updated by AdaLovelace 11 months ago
When can we receive a working openQA environment?
Updated by okurz 11 months ago
Likely the same issue was brought up in https://suse.slack.com/archives/C02CANHLANP/p1704826925398509?thread_ts=1704826925.398509&cid=C02CANHLANP .
AdaLovelace wrote in #note-4:
When can we receive a working openQA environment?
Likely in the next weeks. The ticket was just set to High some days ago meaning our usual reaction time for High tickets leaves about 25 days until we would remind the team again to look into this
Updated by okurz 11 months ago
- Related to action #137408: Support move of s390x mainframe(s) to PRG2 - o3 size:M added
Updated by okurz 11 months ago · Edited
https://openqa.opensuse.org/tests/3860852 passed again. mkittler asked mgriessmeier in https://suse.slack.com/archives/C02CANHLANP/p1704884374336609 and mgriessmeier asked gschlotter who asked astalker who apparently un- and replugged a SFP+ connection which fixed the issue.
From o3 I can also now ping s390zl11:
okurz@new-ariel:~> ping s390zl11.openqanet.opensuse.org
PING s390zl11.openqanet.opensuse.org (10.150.1.41) 56(84) bytes of data.
64 bytes from s390zl11.openqanet.opensuse.org (10.150.1.41): icmp_seq=1 ttl=60 time=0.597 ms
I retriggered other failures in https://openqa.opensuse.org/tests/overview?build=20240107&groupid=34&version=Tumbleweed&distri=opensuse as well and if ok then you can resolve the ticket. continue.
It seems on w23 services like container-openqaworker23_container_102.service
are disabled. I suggest you check 102-104
Updated by mkittler 11 months ago · Edited
- Status changed from In Progress to Feedback
I enabled/started the other worker slots as well and it looks like it works, e.g. https://openqa.opensuse.org/tests/3860905 is currently running.
I also changed the hostnames from openqaworker1 to openqaworker23 in the Wiki.
Updated by AdaLovelace 11 months ago
Thank you!
Interesting that an SFTP connection was the issue. It is working again so far, that we can release in the future again.
I will adopt a needle today for a successful Tumbleweed release.
Updated by okurz 11 months ago
- Status changed from Feedback to Resolved
AdaLovelace wrote in #note-13:
Thank you!
Interesting that an SFTP connection was the issue. It is working again so far, that we can release in the future again.
No, not SFTP. SFP+ as in https://en.wikipedia.org/wiki/Small_Form-factor_Pluggable :)
With that we can resolve
Updated by openqa_review 7 months ago
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: autoyast_zvm
https://openqa.opensuse.org/tests/4091874#step/bootloader_start/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by livdywan 7 months ago
- Status changed from Feedback to Workable
I guess @openqa_review is not aware that we don't want urgent tickets in Feedback 😜
I suppose there were no mitigations here?
{
"args" => [
"output_delim",
"(?^:Loading Installation System)",
"timeout",
300
],
"cmd" => "backend_proxy_console_call",
"console" => "x3270",
"function" => "expect_3270",
"wantarray" => "",
"json_cmd_token" => "vEMHEmMg"
}
expect_3270: timed out.
waiting for $VAR1 = {
'clear_buffer' => 0,
'timeout' => 300,
'expected_status' => qr/RUNNING/u,
'buffer_full' => qr/MORE\.\.\./u,
'delete_lines' => qr/^ +$/u,
'buffer_ready' => $VAR1->{'expected_status'},
'output_delim' => '(?^:Loading Installation System)'
};
[...]
consoles::s3270::expect_3270(consoles::s3270=HASH(0x5601d825c718), "output_delim", "(?^:Loading Installation System)", "timeout", 300) called at /usr/lib/os-autoinst/backend/baseclass.pm line 821
Updated by mkittler 7 months ago
- Status changed from Workable to Resolved
Those are different types of failures now. So I deleted all irrelevant comments referencing this ticket. I suppose reviewers have to look at it and create a new ticket. This time it doesn't look like there's something fundamentally broken with the containerized setup of s390x workers. It rather looks like the SUT is misbehaving on boot.
Updated by mgriessmeier 7 months ago
there seems to be an issue with retrieving the autoyast file from the worker which should be investigated - based on results of others jobs I don't see a "general" issue with the network
'Downloading AutoYaST file: http://openqaworker23:21013/Cgdb_ENScohOIvfp/files/zv',
'm.xml ',
'[ 13.591189][ T1165] qeth: register layer 3 discipline ',
'[ 13.597000][ T1156] qeth 0.0.0a00: CHID: ff00 CHPID: 0 ',
'[ 13.597678][ T1156] qeth 0.0.0a02: qdio: OSA on SC 2 using AI:1 QEBSM:0 PRI:1',
' TDD:1 SIGA:RW ',
'[ 13.610181][ T1156] qeth 0.0.0a00: Device is a Virtual NIC QDIO card (level: ',
'V730) ',
'[ 13.610181][ T1156] with link type Virt.NIC QDIO. ',
'[ 13.610218][ T1156] qeth 0.0.0a00: Inbound source MAC-address not supported o',
'n (unnamed net_device) ',
'[ 13.610237][ T1156] qeth 0.0.0a00: VLAN enabled ',
'[ 13.610247][ T1156] qeth 0.0.0a00: Multicast enabled ',
'[ 13.610272][ T1156] qeth 0.0.0a00: IPV6 enabled ',
'[ 13.610298][ T1156] qeth 0.0.0a00: Broadcast enabled ',
'[ 13.612387][ T1168] qeth 0.0.0a00 enca00: renamed from eth0 ',
'enca00: network config created ',
'Loading http://openqaworker23:21013/Cgdb_ENScohOIvfp/files/zvm.xml - '
Updated by okurz 5 months ago
- Related to action #162101: [s390x] timeouts on s390x openQA Workers size:M added