action #102467
closedtest fails in reconnect_mgmt_console with auto_review:"Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190."
Description
Observation¶
openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in
reconnect_mgmt_console with
# Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190.
at /usr/lib/os-autoinst/testapi.pm line 1738.
testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1409
utils::reconnect_mgmt_console() called at sle/tests/boot/reconnect_mgmt_console.pm line 16
reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/basetest.pm line 344
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 338
basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/autotest.pm line 368
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 368
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 236
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 236
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 292
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/autotest.pm line 294
autotest::start_process() called at /usr/bin/isotovideo line 260
Reported in https://suse.slack.com/archives/C02CANHLANP/p1636966675331000
Jozef Pupava: There is issue with reconnecting on s390x https://openqa.suse.de/tests/7675480#step/reconnect_mgmt_console/13
Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason.
Jozef Pupava: Don't know about any open ticket
Oleksandr Orlov: @Oliver Kurz @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC
$socket->read(my $protocol_version, 12) || die 'unexpected end of data';
Marius Kittler: Most likely the VNC connection was interrupted.
Oleksandr Orlov: I restarted the job just now and got the same issue...
Oleksandr Orlov: ok, maybe it is related to s390x infrastructure
Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant:
xvnc@0-10.161.145.85:5901-10.162.6.237:50608.service: Failed at step EXEC spawning /usr/libexec/vnc/with-vnc-key.sh: No such file or directory
Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing.
Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package.
Oliver Kurz: The tests run on grenache-1.qa which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 grenache-1(515195) [libsolv] PoolImpl.cc(logSat):127 job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression.
Reproducible¶
Fails since https://openqa.suse.de/tests/7656248 from 2021-11-12 and reproducible on retry. https://openqa.suse.de/tests/7675480#comments shows that all investigate jobs fail so likely a regression from infrastructure changes.
EDIT: Results from the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed again.
Expected result¶
Last good: 61.1 from 2021-11-06
Problem¶
Problem¶
H1 ACCEPTED The product has changed -> the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed so this is the likely hypothesis
H2 REJECTED Fails because of changes in test setup -> #102467#note-5
- H2.1 Our test hardware equipment behaves different
- H2.2 The network behaves different
H3 REJECTED Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA -> #102467#note-5
H4 REJECTED Fails because of changes in test management configuration, e.g. openQA database settings -> #102467#note-5
H5 REJECTED Fails because of changes in the test software itself (the test plan in source code as well as needles) -> #102467#note-5
H6 REJECTED Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time -> #102467#note-5
Further details¶
Always latest result in this scenario: latest
Updated by okurz almost 3 years ago
- Project changed from openQA Tests to openQA Project
- Category changed from Bugs in existing tests to Regressions/Crashes
Updated by okurz almost 3 years ago
- Description updated (diff)
- Assignee set to okurz
Updated by okurz almost 3 years ago
- Description updated (diff)
- Status changed from New to In Progress
picked up the ticket to refine description based on automatic investigation information.
I am pretty sure this is a product regression. See the green blob on the "first bad" https://openqa.suse.de/tests/7656248#comments ? It's only that the tests are so unstable in the gnome login manager handling due to other problems that the "last good build" investigation job also often fails but with different symptoms.
@Oleksandr Orlov I wonder, did you expect anything different from your manual retry in comparison to the automatically triggered investigation jobs?
I did an SQL query on the OSD database
select job_id from job_modules as m left join jobs on jobs.id = m.job_id where m.t_updated >= '2021-11-15' and t_finished >= '2021-11-15' and name ~ 'reconnect_mgmt_console' and machine = 's390x-kvm-sle12' limit 1;
which revealed https://openqa.suse.de/tests/7677247#step/reconnect_mgmt_console/11 showing that SLE15-SP3 tests from today on s390x-kvm-sle12 still pass the step of "reconnect_mgmt_console" just fine -> rejecting H2+H3https://openqa.suse.de/tests/7656248#investigation shows no relevant changes in test settings -> rejecting H4
https://openqa.suse.de/tests/7656248#investigation shows no relevant changes in test code and SLE15-SP3 tests are fine -> rejecting H5
https://openqa.suse.de/tests/7656248#comments shows multiple retries that show the same problem now and nobody reported the same in before -> rejecting H6
Updated by livdywan almost 3 years ago
drop orphaned xorg-x11-Xvnc-module
If those are orphaned packages in the sense of zypper pa --orphaned
that suggests the repo is broken. The package is available on Tumbleweed and Leap 15.2 -- that said I see Connected to Xvnc - PID
in the failing tests so it doesn't seem like anything required was in fact removed here.
Updated by okurz almost 3 years ago
- Status changed from In Progress to Blocked
clear indications that this is a product regression. Reported https://bugzilla.suse.com/show_bug.cgi?id=1192713
Updated by okurz almost 3 years ago
bug fixes are available already, SRs have been submitted, not yet accepted.
Updated by okurz almost 3 years ago
- Status changed from Blocked to Resolved
both SRs accepted. https://openqa.suse.de/tests/7716270 shows all good again in sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12
Thank you