action #102467: test fails in reconnect_mgmt_console with auto_review:"Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190." - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #102467

closed

test fails in reconnect_mgmt_console with auto_review:"Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190."

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

okurz

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2021-11-15

Due date:

% Done:

Estimated time:

Description

Observation¶

openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in
reconnect_mgmt_console with

# Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190.
 at /usr/lib/os-autoinst/testapi.pm line 1738.
	testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1409
	utils::reconnect_mgmt_console() called at sle/tests/boot/reconnect_mgmt_console.pm line 16
	reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/basetest.pm line 344
	eval {...} called at /usr/lib/os-autoinst/basetest.pm line 338
	basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/autotest.pm line 368
	eval {...} called at /usr/lib/os-autoinst/autotest.pm line 368
	autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 236
	eval {...} called at /usr/lib/os-autoinst/autotest.pm line 236
	autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 292
	autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
	eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
	Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
	Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/autotest.pm line 294
	autotest::start_process() called at /usr/bin/isotovideo line 260

Reported in https://suse.slack.com/archives/C02CANHLANP/p1636966675331000

Jozef Pupava: There is issue with reconnecting on s390x https://openqa.suse.de/tests/7675480#step/reconnect_mgmt_console/13
Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason.
Jozef Pupava: Don't know about any open ticket
Oleksandr Orlov: @Oliver Kurz @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC
$socket->read(my $protocol_version, 12) || die 'unexpected end of data';
Marius Kittler: Most likely the VNC connection was interrupted.
Oleksandr Orlov: I restarted the job just now and got the same issue...
Oleksandr Orlov: ok, maybe it is related to s390x infrastructure
Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant:
xvnc@0-10.161.145.85:5901-10.162.6.237:50608.service: Failed at step EXEC spawning /usr/libexec/vnc/with-vnc-key.sh: No such file or directory
Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing.
Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package.
Oliver Kurz: The tests run on grenache-1.qa which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 <1> grenache-1(515195) [libsolv] PoolImpl.cc(logSat):127 job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression.

Reproducible¶

Fails since https://openqa.suse.de/tests/7656248 from 2021-11-12 and reproducible on retry. https://openqa.suse.de/tests/7675480#comments shows that all investigate jobs fail so likely a regression from infrastructure changes.
EDIT: Results from the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed again.

Expected result¶

Last good: 61.1 from 2021-11-06

Problem¶

H1 ACCEPTED The product has changed -> the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed so this is the likely hypothesis
H2 REJECTED Fails because of changes in test setup -> #102467#note-5
H2.1 Our test hardware equipment behaves different
H2.2 The network behaves different
H3 REJECTED Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA -> #102467#note-5
H4 REJECTED Fails because of changes in test management configuration, e.g. openQA database settings -> #102467#note-5
H5 REJECTED Fails because of changes in the test software itself (the test plan in source code as well as needles) -> #102467#note-5
H6 REJECTED Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time -> #102467#note-5

Further details¶

Always latest result in this scenario: latest

Actions

Copy link

Updated by okurz over 3 years ago

Project changed from openQA Tests (public) to openQA Project (public)
Category changed from Bugs in existing tests to Regressions/Crashes

Actions

Copy link

Updated by okurz over 3 years ago

Description updated (diff)

Actions

Copy link

Updated by okurz over 3 years ago

Description updated (diff)

Actions

Copy link

Updated by okurz over 3 years ago

Description updated (diff)
Assignee set to okurz

Actions

Copy link

Updated by okurz over 3 years ago

Description updated (diff)
Status changed from New to In Progress

picked up the ticket to refine description based on automatic investigation information.

I am pretty sure this is a product regression. See the green blob on the "first bad" https://openqa.suse.de/tests/7656248#comments ? It's only that the tests are so unstable in the gnome login manager handling due to other problems that the "last good build" investigation job also often fails but with different symptoms.

@Oleksandr Orlov I wonder, did you expect anything different from your manual retry in comparison to the automatically triggered investigation jobs?

I did an SQL query on the OSD database select job_id from job_modules as m left join jobs on jobs.id = m.job_id where m.t_updated >= '2021-11-15' and t_finished >= '2021-11-15' and name ~ 'reconnect_mgmt_console' and machine = 's390x-kvm-sle12' limit 1; which revealed https://openqa.suse.de/tests/7677247#step/reconnect_mgmt_console/11 showing that SLE15-SP3 tests from today on s390x-kvm-sle12 still pass the step of "reconnect_mgmt_console" just fine -> rejecting H2+H3
https://openqa.suse.de/tests/7656248#investigation shows no relevant changes in test settings -> rejecting H4
https://openqa.suse.de/tests/7656248#investigation shows no relevant changes in test code and SLE15-SP3 tests are fine -> rejecting H5
https://openqa.suse.de/tests/7656248#comments shows multiple retries that show the same problem now and nobody reported the same in before -> rejecting H6

Actions

Copy link

Updated by livdywan over 3 years ago

drop orphaned xorg-x11-Xvnc-module

If those are orphaned packages in the sense of zypper pa --orphaned that suggests the repo is broken. The package is available on Tumbleweed and Leap 15.2 -- that said I see Connected to Xvnc - PID in the failing tests so it doesn't seem like anything required was in fact removed here.

Actions

Copy link

Updated by okurz over 3 years ago

Status changed from In Progress to Blocked

clear indications that this is a product regression. Reported https://bugzilla.suse.com/show_bug.cgi?id=1192713

Actions

Copy link

Updated by okurz over 3 years ago

bug fixes are available already, SRs have been submitted, not yet accepted.

Actions

Copy link

Updated by okurz over 3 years ago

Status changed from Blocked to Resolved

both SRs accepted. https://openqa.suse.de/tests/7716270 shows all good again in sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12

Thank you

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #102467

test fails in reconnect_mgmt_console with auto_review:"Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190."

Observation¶

Reproducible¶

Expected result¶

Problem¶

Problem¶

Further details¶

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by livdywan over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago