action #49793
closedCan't locate object method "blessed" via package "Error authenticating at /usr/lib/os-autoinst/consoles/VNC.pm line 227.
0%
Description
Observation¶
[2019-03-27T17:13:56.887 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Can't locate object method "blessed" via package "Error authenticating at /usr/lib/os-autoinst/consoles/VNC.pm line 227.
" (perhaps you forgot to load "Error authenticating at /usr/lib/os-autoinst/consoles/VNC.pm line 227.
"?) at /usr/lib/os-autoinst/consoles/network_console.pm line 34.
All s390x tests (both zKVM and zVM) fail as incomplete.
openQA test in scenario sle-12-SP5-Server-DVD-s390x-allpatterns@s390x-kvm-sle12 fails in
reconnect_mgmt_console
Test suite description¶
Maintainers: okurz Installation with all patterns selected for installation to check for potential package conflicts, how the system handles big space usage, etc.
allpatterns installations can take longer, especially on non-x86_64 architectures.
Reproducible¶
Fails since (at least) Build 0112
Sporadic on workers:
- grenache-1 for zKVM tests
- openqaworker2 for zVM tests
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by mkittler over 5 years ago
All s390x tests (both zKVM and zVM) fail as incomplete.
I think the impact isn't that high. E.g. I recently restarted the s390x-zVM test https://openqa.suse.de/tests/2747431 to test whether another issue is fixed now and it passed.
But maybe it is actually the same issue as https://progress.opensuse.org/issues/49676 - even though the error message is different. So I restarted https://openqa.suse.de/tests/2748923 to see whether the fix we deployed yesterday afternoon fixes this as well.
Updated by mkittler over 5 years ago
- Assignee set to mkittler
- Target version set to Current Sprint
Updated by mkittler over 5 years ago
Ok, it still fails. And it is happening on grenache-1 where the change which caused https://progress.opensuse.org/issues/49676 isn't even deployed. So I guess it is really a different issue.
Updated by mkittler over 5 years ago
Note that no deployment of os-autoinst happened between the last good and the first incomplete (or even the last incomplete). It appears the last deployment announced on Tuesday didn't include the workers which are relevant here. The os-autoinst version was always 4.5.1551283945.1b03daca according to the log.
Updated by mkittler over 5 years ago
Like I already mentioned, changes in os-autoinst can not make a difference here because the version hasn't changed. I also saw no relevant changes in the test distribution. So it must be something else.
The error basically says the VNC protocol handshake succeeded but the security handshake failed. When I read the code correctly, the password is not used at this point. So password being incorrectly is likely not the cause, too. Instead it seems that the server returns no "security types". The VNC standards has the following to say about this:
If number-of-security-types is zero,then for some reason the connection failed (e.g. the server can not support the desired protocol version). This is followed by a string describing the reason (where a string is specified as a length followed by that many ASCII characters):
Unfortunately os-autoinst does not read the reason so we don't know why it fails. We should likely implement reading the reason for the failure. Nevertheless the question is what might have changed on the server-side?
Updated by SLindoMansilla over 5 years ago
Interesting, maybe this? https://bugzilla.suse.com/show_bug.cgi?id=1129412
Wrong SLE version. Also the VNC is happening on the worker, not the SUT.
Updated by riafarov over 5 years ago
SLindoMansilla wrote:
Interesting, maybe this? https://bugzilla.suse.com/show_bug.cgi?id=1129412
Not sure if that one is duplicate of https://bugzilla.suse.com/show_bug.cgi?id=1129073 but that's for SLE 15, so should not affect SLE 12.
Updated by SLindoMansilla over 5 years ago
I don't find any zVM job failing on this. Could you provide a link?
I think only zKVM is affected.
I also think that is is sporadic, because not all zKVM failed.
(I think IRC doesn't work for me over VPN)
Updated by SLindoMansilla over 5 years ago
zVM affected job found: https://openqa.suse.de/tests/2745246
Updated by SLindoMansilla over 5 years ago
- Description updated (diff)
Sporadically reproducible on workers:
- grenache-1 for zKVM tests
- openqaworker2 for zVM tests
Examples of passed jobs:
Updated by SLindoMansilla over 5 years ago
Updated by mkittler over 5 years ago
I tried to test my PR https://github.com/os-autoinst/os-autoinst/pull/1137 locally to see what error message the server gives us. However, it didn't work. In the first place because I hadn't set WORKER_HOSTNAME
and then because grenache-1:13
was apparently re-enabled and interfered with my testing (I'm using its virsh config for my local tests).
Updated by mkittler over 5 years ago
@mgriessmeier tested a previous build and it worked. So it is likely really just the product bug which has already been mentioned: https://bugzilla.suse.com/show_bug.cgi?id=1129412
Of course the error handling on the os-autoinst-side should be better:
- The error from the VNC server should be returned. That is what I'm attempting here: https://github.com/os-autoinst/os-autoinst/pull/1137
- It should be a test failure and not an incomplete - at least if we actually can connect to the VNC server but it behaves unexpectedly (like here).
Updated by mkittler over 5 years ago
PR for getting rid of Can't locate object method "blessed" via package "
in the error message: https://github.com/os-autoinst/os-autoinst/pull/1139
Updated by riafarov over 5 years ago
So, as we have suspected, it's a bug on s390x https://bugzilla.suse.com/show_bug.cgi?id=1131569 which breaks VNC connection.
@mkittler: Thanks for improving handling of such cases!
Updated by mkittler over 5 years ago
- Status changed from New to Resolved
If it is a product bug there's nothing to fix on the openQA-side providing a better error message. The mentioned PR has already been merged.
However, my attempt to read the error message from the VNC server wasn't successful (see https://github.com/os-autoinst/os-autoinst/pull/1137#issuecomment-483311767). But if the server is killed by the OOM killer that's actually no surprise.
Since reading the VNC server error is not helpful in for this issue I mark it as resolved despite the PR hasn't been merged yet.