action #73525
opencoordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA
coordination #102909: [epic] Prevent more incompletes already within os-autoinst or openQA
Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*":retry
0%
Description
Observation¶
job https://openqa.suse.de/tests/4847590 is incomplete, the logs show:
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::svirt::start_serial_grab(name="openQA-SUT-1")
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::baseclass::start_ssh_serial(username="root", password="SECRET", hostname="s390p8.suse.de")
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::baseclass::new_ssh_connection(username="root", password="SECRET", hostname="s390p8.suse.de")
[37m[2020-10-19T02:18:59.740 CEST] [debug] SSH connection to root@s390p8.suse.de established
[0m[37m[2020-10-19T02:18:59.790 CEST] [debug] svirt: grabbing serial console
[0mConnected to domain openQA-SUT-1
Escape character is ^]
[37m[2020-10-19T02:19:00.058 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 932.
See more details in https://openqa.suse.de/tests/4847590/file/autoinst-log.txt
Updated by okurz almost 4 years ago
- Subject changed from Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*" to Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*":retry
- Category set to Feature requests
- Priority changed from Normal to Low
- Target version changed from Ready to future
Setting as "Feature request" because this looks like mainly misleading error output with the root cause not obvious. This can certainly be improved. I don't have a better clue so I added ":retry" in the hope that this helps in some cases. But also I don't see anything we can do right now. There are other tickets in related areas so we might come back to this or solve it anyway implicitly elsewhere.
Updated by okurz almost 4 years ago
- Related to action #75019: s390 job via ppc64le worker incompletes on failure to connect to VNC due to "Use of uninitialized value $_[2] in substr at /usr/lib/perl5/5.26.1/ppc64le-linux-thread-multi/IO/Handle.pm" added
Updated by okurz almost 4 years ago
- Related to action #71236: job incompletes with auto_review:"backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused" added
Updated by okurz almost 4 years ago
- Related to action #45062: Better visualization of incompletes - show module in which incomplete happens added
Updated by okurz almost 4 years ago
- Related to coordination #62420: [epic] Distinguish all types of incompletes added
Updated by okurz almost 4 years ago
- Related to deleted (action #45062: Better visualization of incompletes - show module in which incomplete happens)
Updated by okurz over 2 years ago
A recent example from today: https://openqa.opensuse.org/tests/2359276/
Updated by ggardet_arm about 2 years ago
This happens quite often on openqa-aarch64
: https://openqa.opensuse.org/admin/workers/154 https://openqa.opensuse.org/admin/workers/158
[2022-07-05T12:52:29.388605+02:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 183.
Updated by okurz about 2 years ago
The problem can not be the same as the ticket is obviously very old. So yes, the message does not provide a lot of details, but it can not explain a recent rise in problems in case you observe that.
Updated by mkittler about 2 years ago
Besides, this ticket is likely svirt specific.
And yes, this ticket is also too old. It looks like the first occurrence of @ggardet_arm's bug is 2357381 | 2022-05-19 20:09:39 | incomplete | backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 183.
and since then the log of incompletes on openqa-aarch64 is significantly dominated the this error. Unfortunately I'm not sure what causes it. Maybe the culprit is https://github.com/os-autoinst/os-autoinst/commit/d1adda78adc34c5ac02b5040a2bc0e97eaa83827 (and by extension https://github.com/os-autoinst/os-autoinst/commit/93ff454deae61e573a9cbf88f172304002fb83a4). In my tests/investigation with svirt jobs this change was an overall improvement. However, I can imagine that in certain cases it would be better to rather block longer on reads instead of giving up and possibly not being able to recover. I suppose the timeouts should be handled more sensibly. We should create a separate ticket for that problem.
EDIT: I've been creating #113282.
Updated by favogt about 1 year ago
This affects ppc64le in weird ways: https://openqa.opensuse.org/tests/3462935#next_previous
multi_users_dm fails because of some screen refresh issues. When connecting to VNC through an SSH tunnel with vncviewer -Shared
, the screen refreshes and the test continues for a bit, until it stops again. Then VNC traffic completely stops and not even new connections can be established. Eventually the worker process kills QEMU.
Updated by openqa_review 12 months ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: extra_tests_on_kde
https://openqa.opensuse.org/tests/3623176#step/multi_users_dm/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 80 days if nothing changes in this ticket.
Updated by openqa_review 8 months ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: extra_tests_on_kde
https://openqa.opensuse.org/tests/3754449#step/multi_users_dm/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 196 days if nothing changes in this ticket.