Project

General

Profile

Actions

action #73525

open

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

coordination #102909: [epic] Prevent more incompletes already within os-autoinst or openQA

Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*":retry

Added by Xiaojing_liu over 3 years ago. Updated 2 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2020-10-19
Due date:
% Done:

0%

Estimated time:

Description

Observation

job https://openqa.suse.de/tests/4847590 is incomplete, the logs show:

[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::svirt::start_serial_grab(name="openQA-SUT-1")
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::baseclass::start_ssh_serial(username="root", password="SECRET", hostname="s390p8.suse.de")
[2020-10-19T02:18:59.635 CEST] [debug] <<< backend::baseclass::new_ssh_connection(username="root", password="SECRET", hostname="s390p8.suse.de")
[2020-10-19T02:18:59.740 CEST] [debug] SSH connection to root@s390p8.suse.de established
[2020-10-19T02:18:59.790 CEST] [debug] svirt: grabbing serial console
Connected to domain openQA-SUT-1
Escape character is ^]
[2020-10-19T02:19:00.058 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 932.

See more details in https://openqa.suse.de/tests/4847590/file/autoinst-log.txt


Related issues 2 (1 open1 closed)

Related to openQA Project - action #75019: s390 job via ppc64le worker incompletes on failure to connect to VNC due to "Use of uninitialized value $_[2] in substr at /usr/lib/perl5/5.26.1/ppc64le-linux-thread-multi/IO/Handle.pm"New2020-10-21

Actions
Related to openQA Project - action #71236: job incompletes with auto_review:"backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused"Rejectedokurz2020-09-11

Actions
Actions #1

Updated by okurz over 3 years ago

  • Target version set to Ready
Actions #2

Updated by okurz over 3 years ago

  • Subject changed from Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*" to Job incompletes with auto_review:"backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm.*":retry
  • Category set to Feature requests
  • Priority changed from Normal to Low
  • Target version changed from Ready to future

Setting as "Feature request" because this looks like mainly misleading error output with the root cause not obvious. This can certainly be improved. I don't have a better clue so I added ":retry" in the hope that this helps in some cases. But also I don't see anything we can do right now. There are other tickets in related areas so we might come back to this or solve it anyway implicitly elsewhere.

Actions #3

Updated by okurz over 3 years ago

  • Related to action #75019: s390 job via ppc64le worker incompletes on failure to connect to VNC due to "Use of uninitialized value $_[2] in substr at /usr/lib/perl5/5.26.1/ppc64le-linux-thread-multi/IO/Handle.pm" added
Actions #4

Updated by okurz over 3 years ago

  • Related to action #71236: job incompletes with auto_review:"backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused" added
Actions #5

Updated by okurz over 3 years ago

  • Related to action #45062: Better visualization of incompletes - show module in which incomplete happens added
Actions #6

Updated by okurz over 3 years ago

Actions #7

Updated by okurz over 3 years ago

  • Related to deleted (action #45062: Better visualization of incompletes - show module in which incomplete happens)
Actions #8

Updated by okurz over 3 years ago

  • Parent task set to #62420
Actions #9

Updated by okurz over 2 years ago

  • Parent task changed from #62420 to #102909
Actions #10

Updated by okurz almost 2 years ago

A recent example from today: https://openqa.opensuse.org/tests/2359276/

Actions #11

Updated by ggardet_arm over 1 year ago

This happens quite often on openqa-aarch64: https://openqa.opensuse.org/admin/workers/154 https://openqa.opensuse.org/admin/workers/158

[2022-07-05T12:52:29.388605+02:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 183.
Actions #12

Updated by okurz over 1 year ago

The problem can not be the same as the ticket is obviously very old. So yes, the message does not provide a lot of details, but it can not explain a recent rise in problems in case you observe that.

Actions #13

Updated by mkittler over 1 year ago

Besides, this ticket is likely svirt specific.

And yes, this ticket is also too old. It looks like the first occurrence of @ggardet_arm's bug is 2357381 | 2022-05-19 20:09:39 | incomplete | backend died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 183. and since then the log of incompletes on openqa-aarch64 is significantly dominated the this error. Unfortunately I'm not sure what causes it. Maybe the culprit is https://github.com/os-autoinst/os-autoinst/commit/d1adda78adc34c5ac02b5040a2bc0e97eaa83827 (and by extension https://github.com/os-autoinst/os-autoinst/commit/93ff454deae61e573a9cbf88f172304002fb83a4). In my tests/investigation with svirt jobs this change was an overall improvement. However, I can imagine that in certain cases it would be better to rather block longer on reads instead of giving up and possibly not being able to recover. I suppose the timeouts should be handled more sensibly. We should create a separate ticket for that problem.

EDIT: I've been creating #113282.

Actions #14

Updated by favogt 8 months ago

This affects ppc64le in weird ways: https://openqa.opensuse.org/tests/3462935#next_previous

multi_users_dm fails because of some screen refresh issues. When connecting to VNC through an SSH tunnel with vncviewer -Shared, the screen refreshes and the test continues for a bit, until it stops again. Then VNC traffic completely stops and not even new connections can be established. Eventually the worker process kills QEMU.

Actions #16

Updated by openqa_review 6 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_on_kde
https://openqa.opensuse.org/tests/3623176#step/multi_users_dm/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 80 days if nothing changes in this ticket.

Actions #17

Updated by openqa_review 2 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_on_kde
https://openqa.opensuse.org/tests/3754449#step/multi_users_dm/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 196 days if nothing changes in this ticket.

Actions

Also available in: Atom PDF