action #102467

Updated by okurz over 2 years ago

## Observation 

 openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in 
 [reconnect_mgmt_console]( with 

 # Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/ line 190. 
  at /usr/lib/os-autoinst/ line 1738. 
	 testapi::select_console("x11", "await_console", 0) called at sle/lib/ line 1409 
	 utils::reconnect_mgmt_console() called at sle/tests/boot/ line 16 
	 reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/ line 344 
	 eval {...} called at /usr/lib/os-autoinst/ line 338 
	 basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/ line 368 
	 eval {...} called at /usr/lib/os-autoinst/ line 368 
	 autotest::runalltests() called at /usr/lib/os-autoinst/ line 236 
	 eval {...} called at /usr/lib/os-autoinst/ line 236 
	 autotest::run_all() called at /usr/lib/os-autoinst/ line 292 
	 autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ line 326 
	 eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ line 326 
	 Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ line 477 
	 Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/ line 294 
	 autotest::start_process() called at /usr/bin/isotovideo line 260 

 Reported in 

 Jozef Pupava: There is issue with reconnecting on s390x 
 Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason. 
 Jozef Pupava: Don't know about any open ticket 
 Oleksandr Orlov: @Oliver Kurz    @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC 
 $socket->read(my $protocol_version, 12) || die 'unexpected end of data'; 
 Marius Kittler: Most likely the VNC connection was interrupted. 
 Oleksandr Orlov: I restarted the job just now and got the same issue... 
 Oleksandr Orlov: ok, maybe it is related to s390x infrastructure 
 Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant: 
 xvnc@0- Failed at step EXEC spawning /usr/libexec/vnc/ No such file or directory 
 Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing. 
 Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package. 
 Oliver Kurz: The tests run on which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 <1> grenache-1(515195) [libsolv] job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression. 

 ## Reproducible 

 Fails since from 2021-11-12 and reproducible on retry. shows that all investigate jobs fail so likely a regression from infrastructure changes. 
 EDIT: Results from the "first bad" show that the "last good build" investigation job actually passed again. 

 ## Expected result 

 Last good: [61.1]( from 2021-11-06 

 ## Problem 

 ## Problem 
 * **H1** *ACCEPTED* The product has changed -> the "first bad" show unlikely as shows that the "last good build" investigation job actually passed so this is fails as well 
  * **H1.1** product changed slightly but in an acceptable way without the likely hypothesis need for communication with DEV+RM --> adapt test 
  * **H1.2** product changed slightly but in an acceptable way found after feedback from RM --> adapt test 
  * **H1.3** product changed significantly --> after approval by RM adapt test 

 * **H2** *REJECTED* Fails because of changes in test setup -> #102467#note-5 
  * **H2.1** Our test hardware equipment behaves different 
  * **H2.2** The network behaves different 

 * **H3** *REJECTED* Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA -> #102467#note-5 
 * **H4** *REJECTED* Fails because of changes in test management configuration, e.g. openQA database settings -> #102467#note-5 
 * **H5** *REJECTED* Fails because of changes in the test software itself (the test plan in source code as well as needles) -> #102467#note-5 
 * **H6** *REJECTED* Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time -> #102467#note-5 


 ## Further details 

 Always latest result in this scenario: [latest](