Project

General

Profile

Actions

action #49793

closed

Can't locate object method "blessed" via package "Error authenticating at /usr/lib/os-autoinst/consoles/VNC.pm line 227.

Added by riafarov about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2019-03-28
Due date:
% Done:

0%

Estimated time:

Description

Observation

[2019-03-27T17:13:56.887 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Can't locate object method "blessed" via package "Error authenticating at /usr/lib/os-autoinst/consoles/VNC.pm line 227.
" (perhaps you forgot to load "Error authenticating at /usr/lib/os-autoinst/consoles/VNC.pm line 227.
"?) at /usr/lib/os-autoinst/consoles/network_console.pm line 34.

All s390x tests (both zKVM and zVM) fail as incomplete.

openQA test in scenario sle-12-SP5-Server-DVD-s390x-allpatterns@s390x-kvm-sle12 fails in
reconnect_mgmt_console

Test suite description

Maintainers: okurz Installation with all patterns selected for installation to check for potential package conflicts, how the system handles big space usage, etc.

allpatterns installations can take longer, especially on non-x86_64 architectures.

Reproducible

Fails since (at least) Build 0112

Sporadic on workers:

  • grenache-1 for zKVM tests
  • openqaworker2 for zVM tests

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by riafarov about 5 years ago

  • Description updated (diff)
Actions #2

Updated by mkittler about 5 years ago

All s390x tests (both zKVM and zVM) fail as incomplete.

I think the impact isn't that high. E.g. I recently restarted the s390x-zVM test https://openqa.suse.de/tests/2747431 to test whether another issue is fixed now and it passed.

But maybe it is actually the same issue as https://progress.opensuse.org/issues/49676 - even though the error message is different. So I restarted https://openqa.suse.de/tests/2748923 to see whether the fix we deployed yesterday afternoon fixes this as well.

Actions #3

Updated by mkittler about 5 years ago

  • Assignee set to mkittler
  • Target version set to Current Sprint
Actions #4

Updated by mkittler about 5 years ago

Ok, it still fails. And it is happening on grenache-1 where the change which caused https://progress.opensuse.org/issues/49676 isn't even deployed. So I guess it is really a different issue.

Actions #5

Updated by mkittler about 5 years ago

Note that no deployment of os-autoinst happened between the last good and the first incomplete (or even the last incomplete). It appears the last deployment announced on Tuesday didn't include the workers which are relevant here. The os-autoinst version was always 4.5.1551283945.1b03daca according to the log.

Actions #6

Updated by mkittler about 5 years ago

Like I already mentioned, changes in os-autoinst can not make a difference here because the version hasn't changed. I also saw no relevant changes in the test distribution. So it must be something else.

The error basically says the VNC protocol handshake succeeded but the security handshake failed. When I read the code correctly, the password is not used at this point. So password being incorrectly is likely not the cause, too. Instead it seems that the server returns no "security types". The VNC standards has the following to say about this:

If number-of-security-types is zero,then for some reason the connection failed (e.g. the server can not support the desired protocol version). This is followed by a string describing the reason (where a string is specified as a length followed by that many ASCII characters):

Unfortunately os-autoinst does not read the reason so we don't know why it fails. We should likely implement reading the reason for the failure. Nevertheless the question is what might have changed on the server-side?

Actions #7

Updated by SLindoMansilla about 5 years ago

Interesting, maybe this? https://bugzilla.suse.com/show_bug.cgi?id=1129412
Wrong SLE version. Also the VNC is happening on the worker, not the SUT.

Actions #8

Updated by riafarov about 5 years ago

SLindoMansilla wrote:

Interesting, maybe this? https://bugzilla.suse.com/show_bug.cgi?id=1129412

Not sure if that one is duplicate of https://bugzilla.suse.com/show_bug.cgi?id=1129073 but that's for SLE 15, so should not affect SLE 12.

Actions #9

Updated by SLindoMansilla about 5 years ago

I don't find any zVM job failing on this. Could you provide a link?
I think only zKVM is affected.
I also think that is is sporadic, because not all zKVM failed.

(I think IRC doesn't work for me over VPN)

Actions #10

Updated by SLindoMansilla about 5 years ago

zVM affected job found: https://openqa.suse.de/tests/2745246

Actions #11

Updated by SLindoMansilla about 5 years ago

  • Description updated (diff)

Sporadically reproducible on workers:

  • grenache-1 for zKVM tests
  • openqaworker2 for zVM tests

Examples of passed jobs:

Actions #12

Updated by SLindoMansilla about 5 years ago

  • Description updated (diff)
Actions #14

Updated by mkittler about 5 years ago

I tried to test my PR https://github.com/os-autoinst/os-autoinst/pull/1137 locally to see what error message the server gives us. However, it didn't work. In the first place because I hadn't set WORKER_HOSTNAME and then because grenache-1:13 was apparently re-enabled and interfered with my testing (I'm using its virsh config for my local tests).

Actions #15

Updated by mkittler about 5 years ago

@mgriessmeier tested a previous build and it worked. So it is likely really just the product bug which has already been mentioned: https://bugzilla.suse.com/show_bug.cgi?id=1129412

Of course the error handling on the os-autoinst-side should be better:

  1. The error from the VNC server should be returned. That is what I'm attempting here: https://github.com/os-autoinst/os-autoinst/pull/1137
  2. It should be a test failure and not an incomplete - at least if we actually can connect to the VNC server but it behaves unexpectedly (like here).
Actions #16

Updated by mkittler about 5 years ago

PR for getting rid of Can't locate object method "blessed" via package " in the error message: https://github.com/os-autoinst/os-autoinst/pull/1139

Actions #17

Updated by riafarov about 5 years ago

So, as we have suspected, it's a bug on s390x https://bugzilla.suse.com/show_bug.cgi?id=1131569 which breaks VNC connection.
@mkittler: Thanks for improving handling of such cases!

Actions #18

Updated by mkittler about 5 years ago

  • Status changed from New to Resolved

If it is a product bug there's nothing to fix on the openQA-side providing a better error message. The mentioned PR has already been merged.

However, my attempt to read the error message from the VNC server wasn't successful (see https://github.com/os-autoinst/os-autoinst/pull/1137#issuecomment-483311767). But if the server is killed by the OOM killer that's actually no surprise.

Since reading the VNC server error is not helpful in for this issue I mark it as resolved despite the PR hasn't been merged yet.

Actions

Also available in: Atom PDF