action #102467
Updated by okurz almost 3 years ago
## Observation openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in [reconnect_mgmt_console](https://openqa.suse.de/tests/7675480/modules/reconnect_mgmt_console/steps/13) with ``` # Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190. at /usr/lib/os-autoinst/testapi.pm line 1738. testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1409 utils::reconnect_mgmt_console() called at sle/tests/boot/reconnect_mgmt_console.pm line 16 reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/basetest.pm line 344 eval {...} called at /usr/lib/os-autoinst/basetest.pm line 338 basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/autotest.pm line 368 eval {...} called at /usr/lib/os-autoinst/autotest.pm line 368 autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 236 eval {...} called at /usr/lib/os-autoinst/autotest.pm line 236 autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 292 autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326 eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326 Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477 Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/autotest.pm line 294 autotest::start_process() called at /usr/bin/isotovideo line 260 ``` Reported in https://suse.slack.com/archives/C02CANHLANP/p1636966675331000 Jozef Pupava: There is issue with reconnecting on s390x https://openqa.suse.de/tests/7675480#step/reconnect_mgmt_console/13 Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason. Jozef Pupava: Don't know about any open ticket Oleksandr Orlov: @Oliver Kurz @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC $socket->read(my $protocol_version, 12) || die 'unexpected end of data'; Marius Kittler: Most likely the VNC connection was interrupted. Oleksandr Orlov: I restarted the job just now and got the same issue... Oleksandr Orlov: ok, maybe it is related to s390x infrastructure Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant: xvnc@0-10.161.145.85:5901-10.162.6.237:50608.service: Failed at step EXEC spawning /usr/libexec/vnc/with-vnc-key.sh: No such file or directory Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing. Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package. Oliver Kurz: The tests run on grenache-1.qa which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 <1> grenache-1(515195) [libsolv] PoolImpl.cc(logSat):127 job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression. ## Reproducible Fails since https://openqa.suse.de/tests/7656248 from 2021-11-12 and reproducible on retry. https://openqa.suse.de/tests/7675480#comments shows that all investigate jobs fail so likely a regression from infrastructure changes. EDIT: Results from the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed again. ## Expected result Last good: [61.1](https://openqa.suse.de/tests/7618957) from 2021-11-06 ## Problem ## Problem * **H1** *ACCEPTED* The product has changed -> the "first bad" https://openqa.suse.de/tests/7656248#comments show unlikely as https://openqa.suse.de/tests/7675480#comments shows that the "last good build" investigation job actually passed so this is fails as well * **H1.1** product changed slightly but in an acceptable way without the likely hypothesis need for communication with DEV+RM --> adapt test * **H1.2** product changed slightly but in an acceptable way found after feedback from RM --> adapt test * **H1.3** product changed significantly --> after approval by RM adapt test * **H2** *REJECTED* Fails because of changes in test setup -> #102467#note-5 * **H2.1** Our test hardware equipment behaves different * **H2.2** The network behaves different * **H3** *REJECTED* Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA -> #102467#note-5 * **H4** *REJECTED* Fails because of changes in test management configuration, e.g. openQA database settings -> #102467#note-5 * **H5** *REJECTED* Fails because of changes in the test software itself (the test plan in source code as well as needles) -> #102467#note-5 * **H6** *REJECTED* Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time -> #102467#note-5 ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm-sle12&test=create_hdd_gnome&version=15-SP4)