Project

General

Profile

action #102467

test fails in reconnect_mgmt_console with auto_review:"Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190."

Added by okurz 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2021-11-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in
reconnect_mgmt_console with

# Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 190.
 at /usr/lib/os-autoinst/testapi.pm line 1738.
    testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1409
    utils::reconnect_mgmt_console() called at sle/tests/boot/reconnect_mgmt_console.pm line 16
    reconnect_mgmt_console::run(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/basetest.pm line 344
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 338
    basetest::runtest(reconnect_mgmt_console=HASH(0x1000fb5cad0)) called at /usr/lib/os-autoinst/autotest.pm line 368
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 368
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 236
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 236
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 292
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80), CODE(0x10011786420)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x10010e60c80)) called at /usr/lib/os-autoinst/autotest.pm line 294
    autotest::start_process() called at /usr/bin/isotovideo line 260

Reported in https://suse.slack.com/archives/C02CANHLANP/p1636966675331000

Jozef Pupava: There is issue with reconnecting on s390x https://openqa.suse.de/tests/7675480#step/reconnect_mgmt_console/13
Oleksandr Orlov: Hi Jozef, do you know if there are any opened ticket for this? In the last build there are plenty of failed jobs due to that reason.
Jozef Pupava: Don't know about any open ticket
Oleksandr Orlov: @Oliver Kurz @Marius Kittler Could you please assist us with this issue? Do you have any ideas why that may happen? Seems like it fails when os-autoinst tries to read protocol version from VNC
$socket->read(my $protocol_version, 12) || die 'unexpected end of data';
Marius Kittler: Most likely the VNC connection was interrupted.
Oleksandr Orlov: I restarted the job just now and got the same issue...
Oleksandr Orlov: ok, maybe it is related to s390x infrastructure
Marius Kittler: I guess the error message which appears directly before (within the logs) is relevant:
xvnc@0-10.161.145.85:5901-10.162.6.237:50608.service: Failed at step EXEC spawning /usr/libexec/vnc/with-vnc-key.sh: No such file or directory
Marius Kittler: So it does not look like a connection error but like some script responsible for the VNC server is missing.
Marius Kittler: This script is normally provided by the xorg-x11-Xvnc package.
Oliver Kurz: The tests run on grenache-1.qa which still has the package "xorg-x11-Xvnc". /var/log/zypper.log states "2021-11-15 06:05:05 grenache-1(515195) [libsolv] PoolImpl.cc(logSat):127 job: drop orphaned xorg-x11-Xvnc-module". Not sure what that means. I suspect package updates over the weekend cause this regression.

Reproducible

Fails since https://openqa.suse.de/tests/7656248 from 2021-11-12 and reproducible on retry. https://openqa.suse.de/tests/7675480#comments shows that all investigate jobs fail so likely a regression from infrastructure changes.
EDIT: Results from the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed again.

Expected result

Last good: 61.1 from 2021-11-06

Problem

Problem

  • H1 ACCEPTED The product has changed -> the "first bad" https://openqa.suse.de/tests/7656248#comments show that the "last good build" investigation job actually passed so this is the likely hypothesis

  • H2 REJECTED Fails because of changes in test setup -> #102467#note-5

    • H2.1 Our test hardware equipment behaves different
    • H2.2 The network behaves different
  • H3 REJECTED Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA -> #102467#note-5

  • H4 REJECTED Fails because of changes in test management configuration, e.g. openQA database settings -> #102467#note-5

  • H5 REJECTED Fails because of changes in the test software itself (the test plan in source code as well as needles) -> #102467#note-5

  • H6 REJECTED Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time -> #102467#note-5

Further details

Always latest result in this scenario: latest

History

#1 Updated by okurz 2 months ago

  • Project changed from openQA Tests to openQA Project
  • Category changed from Bugs in existing tests to Concrete Bugs

#2 Updated by okurz 2 months ago

  • Description updated (diff)

#3 Updated by okurz 2 months ago

  • Description updated (diff)

#4 Updated by okurz 2 months ago

  • Description updated (diff)
  • Assignee set to okurz

#5 Updated by okurz 2 months ago

  • Description updated (diff)
  • Status changed from New to In Progress

picked up the ticket to refine description based on automatic investigation information.

I am pretty sure this is a product regression. See the green blob on the "first bad" https://openqa.suse.de/tests/7656248#comments ? It's only that the tests are so unstable in the gnome login manager handling due to other problems that the "last good build" investigation job also often fails but with different symptoms.

@Oleksandr Orlov I wonder, did you expect anything different from your manual retry in comparison to the automatically triggered investigation jobs?

#6 Updated by cdywan 2 months ago

drop orphaned xorg-x11-Xvnc-module

If those are orphaned packages in the sense of zypper pa --orphaned that suggests the repo is broken. The package is available on Tumbleweed and Leap 15.2 -- that said I see Connected to Xvnc - PID in the failing tests so it doesn't seem like anything required was in fact removed here.

#7 Updated by okurz 2 months ago

  • Status changed from In Progress to Blocked

clear indications that this is a product regression. Reported https://bugzilla.suse.com/show_bug.cgi?id=1192713

#8 Updated by okurz 2 months ago

bug fixes are available already, SRs have been submitted, not yet accepted.

#9 Updated by okurz 2 months ago

  • Status changed from Blocked to Resolved

both SRs accepted. https://openqa.suse.de/tests/7716270 shows all good again in sle-15-SP4-Online-s390x-create_hdd_gnome@s390x-kvm-sle12

Thank you

Also available in: Atom PDF