Project

General

Profile

action #102467

Updated by okurz almost 3 years ago

## Observation 

 openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in 
 [reconnect_mgmt_console](https://openqa.suse.de/tests/7675480/modules/reconnect_mgmt_console/steps/13) with 

 ``` 
 # Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190. 
  at /usr/lib/os-autoinst/testapi.pm line 1738. 
	 testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1409 
	 utils::reconnect_mgmt_console() called at sle/tests/boot/reconnect_mgmt_console.pm line 16 
	 reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/basetest.pm line 344 
	 eval {...} called at /usr/lib/os-autoinst/basetest.pm line 338 
	 basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/autotest.pm line 368 
	 eval {...} called at /usr/lib/os-autoinst/autotest.pm line 368 
	 autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 236 
	 eval {...} called at /usr/lib/os-autoinst/autotest.pm line 236 
	 autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 292 
	 autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326 
	 eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326 
	 Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477 
	 Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/autotest.pm line 294 
	 autotest::start_process() called at /usr/bin/isotovideo line 260 
 ``` 

 Reported in https://suse.slack.com/archives/C02CANHLANP/p1636966675331000 

 Jozef Pupava: There is issue with reconnecting on s390x https://openqa.suse.de/tests/7675480#step/reconnect_mgmt_console/13 
 Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason. 
 Jozef Pupava: Don't know about any open ticket 
 Oleksandr Orlov: @Oliver Kurz    @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC 
 $socket->read(my $protocol_version, 12) || die 'unexpected end of data'; 
 Marius Kittler: Most likely the VNC connection was interrupted. 
 Oleksandr Orlov: I restarted the job just now and got the same issue... 
 Oleksandr Orlov: ok, maybe it is related to s390x infrastructure 
 Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant: 
 xvnc@0-10.161.145.85:5901-10.162.6.237:50608.service: Failed at step EXEC spawning /usr/libexec/vnc/with-vnc-key.sh: No such file or directory 
 Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing. 
 Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package. 
 Oliver Kurz: The tests run on grenache-1.qa which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 <1> grenache-1(515195) [libsolv] PoolImpl.cc(logSat):127 job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression. 


 ## Reproducible 

 Fails since https://openqa.suse.de/tests/7656248 from 2021-11-12 and reproducible on retry. https://openqa.suse.de/tests/7675480#comments shows that all investigate jobs fail so likely a regression from infrastructure changes. 


 ## Expected result 

 Last good: [61.1](https://openqa.suse.de/tests/7618957) from 2021-11-06 


 ## Problem 

 ## Problem 
 * **H1** The product has changed -> unlikely as https://openqa.suse.de/tests/7675480#comments shows that the "last good build" fails as well 
  * **H1.1** product changed slightly but in an acceptable way without the need for communication with DEV+RM --> adapt test 
  * **H1.2** product changed slightly but in an acceptable way found after feedback from RM --> adapt test 
  * **H1.3** product changed significantly --> after approval by RM adapt test 

 * **H2** Fails because of changes in test setup 
  * **H2.1** Our test hardware equipment behaves different 
  * **H2.2** The network behaves different 

 * **H3** Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA 
 * **H4** Fails because of changes in test management configuration, e.g. openQA database settings 
 * **H5** Fails because of changes in the test software itself (the test plan in source code as well as needles) 
 * **H6** Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time 



 ## Further details 

 Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm-sle12&test=create_hdd_gnome&version=15-SP4) 

Back