action #102467
Updated by okurz about 3 years ago
## Observation
openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in
[reconnect_mgmt_console](https://openqa.suse.de/tests/7675480/modules/reconnect_mgmt_console/steps/13) with
```
# Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190.
at /usr/lib/os-autoinst/testapi.pm line 1738.
testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1409
utils::reconnect_mgmt_console() called at sle/tests/boot/reconnect_mgmt_console.pm line 16
reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/basetest.pm line 344
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 338
basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/autotest.pm line 368
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 368
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 236
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 236
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 292
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/autotest.pm line 294
autotest::start_process() called at /usr/bin/isotovideo line 260
```
Reported in https://suse.slack.com/archives/C02CANHLANP/p1636966675331000
Jozef Pupava: There is issue with reconnecting on s390x https://openqa.suse.de/tests/7675480#step/reconnect_mgmt_console/13
Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason.
Jozef Pupava: Don't know about any open ticket
Oleksandr Orlov: @Oliver Kurz @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC
$socket->read(my $protocol_version, 12) || die 'unexpected end of data';
Marius Kittler: Most likely the VNC connection was interrupted.
Oleksandr Orlov: I restarted the job just now and got the same issue...
Oleksandr Orlov: ok, maybe it is related to s390x infrastructure
Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant:
xvnc@0-10.161.145.85:5901-10.162.6.237:50608.service: Failed at step EXEC spawning /usr/libexec/vnc/with-vnc-key.sh: No such file or directory
Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing.
Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package.
Oliver Kurz: The tests run on grenache-1.qa which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 <1> grenache-1(515195) [libsolv] PoolImpl.cc(logSat):127 job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression.
## Reproducible
Fails since https://openqa.suse.de/tests/7656248 from 2021-11-12 and reproducible on retry. https://openqa.suse.de/tests/7675480#comments shows that all investigate jobs fail so likely a regression from infrastructure changes.
## Expected result
Last good: [61.1](https://openqa.suse.de/tests/7618957) from 2021-11-06
## Problem
## Problem
* **H1** The product has changed -> unlikely as https://openqa.suse.de/tests/7675480#comments shows that the "last good build" fails as well
* **H1.1** product changed slightly but in an acceptable way without the need for communication with DEV+RM --> adapt test
* **H1.2** product changed slightly but in an acceptable way found after feedback from RM --> adapt test
* **H1.3** product changed significantly --> after approval by RM adapt test
* **H2** Fails because of changes in test setup
* **H2.1** Our test hardware equipment behaves different
* **H2.2** The network behaves different
* **H3** Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA
* **H4** Fails because of changes in test management configuration, e.g. openQA database settings
* **H5** Fails because of changes in the test software itself (the test plan in source code as well as needles)
* **H6** Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time
## Further details
Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm-sle12&test=create_hdd_gnome&version=15-SP4)