action #73525
open
coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA
coordination #102909: [epic] Prevent more incompletes already within os-autoinst or openQA
Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*":retry
Added by Xiaojing_liu about 4 years ago.
Updated 12 months ago.
Category:
Feature requests
Description
Observation¶
job https://openqa.suse.de/tests/4847590 is incomplete, the logs show:
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::svirt::start_serial_grab(name="openQA-SUT-1")
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::baseclass::start_ssh_serial(username="root", password="SECRET", hostname="s390p8.suse.de")
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::baseclass::new_ssh_connection(username="root", password="SECRET", hostname="s390p8.suse.de")
[37m[2020-10-19T02:18:59.740 CEST] [debug] SSH connection to root@s390p8.suse.de established
[0m[37m[2020-10-19T02:18:59.790 CEST] [debug] svirt: grabbing serial console
[0mConnected to domain openQA-SUT-1
Escape character is ^]
[37m[2020-10-19T02:19:00.058 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 932.
See more details in https://openqa.suse.de/tests/4847590/file/autoinst-log.txt
- Target version set to Ready
- Subject changed from Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*" to Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*":retry
- Category set to Feature requests
- Priority changed from Normal to Low
- Target version changed from Ready to future
Setting as "Feature request" because this looks like mainly misleading error output with the root cause not obvious. This can certainly be improved. I don't have a better clue so I added ":retry" in the hope that this helps in some cases. But also I don't see anything we can do right now. There are other tickets in related areas so we might come back to this or solve it anyway implicitly elsewhere.
- Related to action #75019: s390 job via ppc64le worker incompletes on failure to connect to VNC due to "Use of uninitialized value $_[2] in substr at /usr/lib/perl5/5.26.1/ppc64le-linux-thread-multi/IO/Handle.pm" added
- Related to action #71236: job incompletes with auto_review:"backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused" added
- Related to action #45062: Better visualization of incompletes - show module in which incomplete happens added
- Related to deleted (action #45062: Better visualization of incompletes - show module in which incomplete happens)
- Parent task set to #62420
- Parent task changed from #62420 to #102909
The problem can not be the same as the ticket is obviously very old. So yes, the message does not provide a lot of details, but it can not explain a recent rise in problems in case you observe that.
Besides, this ticket is likely svirt specific.
And yes, this ticket is also too old. It looks like the first occurrence of @ggardet_arm's bug is 2357381 | 2022-05-19 20:09:39 | incomplete | backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 183.
and since then the log of incompletes on openqa-aarch64 is significantly dominated the this error. Unfortunately I'm not sure what causes it. Maybe the culprit is https://github.com/os-autoinst/os-autoinst/commit/d1adda78adc34c5ac02b5040a2bc0e97eaa83827 (and by extension https://github.com/os-autoinst/os-autoinst/commit/93ff454deae61e573a9cbf88f172304002fb83a4). In my tests/investigation with svirt jobs this change was an overall improvement. However, I can imagine that in certain cases it would be better to rather block longer on reads instead of giving up and possibly not being able to recover. I suppose the timeouts should be handled more sensibly. We should create a separate ticket for that problem.
EDIT: I've been creating #113282.
This affects ppc64le in weird ways: https://openqa.opensuse.org/tests/3462935#next_previous
multi_users_dm fails because of some screen refresh issues. When connecting to VNC through an SSH tunnel with vncviewer -Shared
, the screen refreshes and the test continues for a bit, until it stops again. Then VNC traffic completely stops and not even new connections can be established. Eventually the worker process kills QEMU.
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: extra_tests_on_kde
https://openqa.opensuse.org/tests/3623176#step/multi_users_dm/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 80 days if nothing changes in this ticket.
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: extra_tests_on_kde
https://openqa.opensuse.org/tests/3754449#step/multi_users_dm/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 196 days if nothing changes in this ticket.
Also available in: Atom
PDF