Project

General

Profile

Actions

action #102467

closed

test fails in reconnect_mgmt_console with auto_review:"Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190."

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-11-15
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in
reconnect_mgmt_console with

# Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190.
 at /usr/lib/os-autoinst/testapi.pm line 1738.
    testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1409
    utils::reconnect_mgmt_console() called at sle/tests/boot/reconnect_mgmt_console.pm line 16
    reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/basetest.pm line 344
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 338
    basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/autotest.pm line 368
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 368
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 236
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 236
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 292
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/autotest.pm line 294
    autotest::start_process() called at /usr/bin/isotovideo line 260

Reported in https://suse.slack.com/archives/C02CANHLANP/p1636966675331000

Jozef Pupava: There is issue with reconnecting on s390x https://openqa.suse.de/tests/7675480#step/reconnect_mgmt_console/13
Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason.
Jozef Pupava: Don't know about any open ticket
Oleksandr Orlov: @Oliver Kurz @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC
$socket->read(my $protocol_version, 12) || die 'unexpected end of data';
Marius Kittler: Most likely the VNC connection was interrupted.
Oleksandr Orlov: I restarted the job just now and got the same issue...
Oleksandr Orlov: ok, maybe it is related to s390x infrastructure
Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant:
xvnc@0-10.161.145.85:5901-10.162.6.237:50608.service: Failed at step EXEC spawning /usr/libexec/vnc/with-vnc-key.sh: No such file or directory
Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing.
Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package.
Oliver Kurz: The tests run on grenache-1.qa which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 grenache-1(515195) [libsolv] PoolImpl.cc(logSat):127 job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression.

Reproducible

Fails since https://openqa.suse.de/tests/7656248 from 2021-11-12 and reproducible on retry. https://openqa.suse.de/tests/7675480#comments shows that all investigate jobs fail so likely a regression from infrastructure changes.
EDIT: Results from the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed again.

Expected result

Last good: 61.1 from 2021-11-06

Problem

Problem

  • H1 ACCEPTED The product has changed -> the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed so this is the likely hypothesis

  • H2 REJECTED Fails because of changes in test setup -> #102467#note-5

    • H2.1 Our test hardware equipment behaves different
    • H2.2 The network behaves different
  • H3 REJECTED Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA -> #102467#note-5

  • H4 REJECTED Fails because of changes in test management configuration, e.g. openQA database settings -> #102467#note-5

  • H5 REJECTED Fails because of changes in the test software itself (the test plan in source code as well as needles) -> #102467#note-5

  • H6 REJECTED Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time -> #102467#note-5

Further details

Always latest result in this scenario: latest

Actions

Also available in: Atom PDF