Project

General

Profile

Actions

action #152578

closed

Many incompletes with "Error connecting to VNC server <unreal6.qe.nue2.suse.org:...>" size:M

Added by tinita 5 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-12-13
Due date:
% Done:

0%

Estimated time:

Description

Observation

See also #152569 / #152560

There seems to be a problem connecting to unreal6.qe.nue2.suse.org for several weeks now.

https://openqa.suse.de/tests/13062217

Reason: backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.org:5904>: IO::Socket::INET: connect: Connection refused
select count(id), substring(reason from 0 for 70) as reason_substr from jobs where t_finished >= '2023-11-01T00:00:00' and result = 'incomplete' group by reason_substr order by count(id) desc;

Suggestions

Problem

  • H1 REJECTED The product has changed -> It happened for several different builds, and there were also successful tests with the same builds

    • H1.1 product changed slightly but in an acceptable way without the need for communication with DEV+RM --> adapt test
    • H1.2 product changed slightly but in an acceptable way found after feedback from RM --> adapt test
    • H1.3 product changed significantly --> after approval by RM adapt test
  • H2 Fails because of changes in test setup

    • H2.1 Our test hardware equipment behaves different
    • H2.2 The network behaves different
  • H3 Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA

  • H4 Fails because of changes in test management configuration, e.g. openQA database settings

  • H5 Fails because of changes in the test software itself (the test plan in source code as well as needles)

  • H6 Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time


Related issues 3 (1 open2 closed)

Related to openQA Project - action #152569: Many incomplete jobs endlessly restarted over several weeks size:MResolvedtinita2023-12-132024-01-12

Actions
Related to openQA Project - action #152560: [alert] Incomplete jobs (not restarted) of last 24h alert SaltResolvedtinita2023-12-13

Actions
Related to openQA Project - action #76813: [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused"New2020-10-30

Actions
Actions #1

Updated by tinita 5 months ago

  • Related to action #152569: Many incomplete jobs endlessly restarted over several weeks size:M added
Actions #2

Updated by tinita 5 months ago

  • Related to action #152560: [alert] Incomplete jobs (not restarted) of last 24h alert Salt added
Actions #3

Updated by livdywan 5 months ago

  • Subject changed from Many incompletes with "Error connecting to VNC server <unreal6.qe.nue2.suse.org:...>" to Many incompletes with "Error connecting to VNC server <unreal6.qe.nue2.suse.org:...>" size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by okurz 4 months ago ยท Edited

Crosschecking with "last good" build retriggered

openqa-clone-job --within-instance https://openqa.suse.de/tests/12841342 {TEST,BUILD}+=-poo152578 _GROUP=0

-> https://openqa.suse.de/tests/13112758

failed with what looks like the same problem so either there is no product regression or despite using the old "last good" iso the test does not reflect the complete state of product as of "last good".

Potentially related:

tinita and me monitored "virsh list" on unreal6 while a test was running and we observed that the VM was not or not anymore running after the initial steps.
This reminds me of
https://bugzilla.suse.com/show_bug.cgi?id=1209245

Actions #5

Updated by tinita 4 months ago

  • Description updated (diff)
  • Status changed from Workable to In Progress
  • Assignee set to tinita
Actions #6

Updated by tinita 4 months ago

I looked into how many incompletes we had in the last days.
It turns out that starting since december 17 we only have investigate jobs having that problem:

openqa=> select count(id), test, substring(reason from 0 for 70) as reason_substr from jobs where t_finished >= '2023-12-17T00:00:00' and result = 'incomplete' and reason like '%unreal6%' group by reason_substr, test order by count(id) desc;
 count |                                            test                                            |                             reason_substr                             
-------+--------------------------------------------------------------------------------------------+-----------------------------------------------------------------------
   378 | jeos-base+sdk+desktop:investigate:last_good_tests:f99df70fc4702425fc55668a06d45bc639bf5056 | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   206 | jeos-filesystem:investigate:retry                                                          | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   205 | jeos-extratest:investigate:retry                                                           | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   205 | jeos-kdump:investigate:retry                                                               | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   204 | jeos-containers-docker:investigate:retry                                                   | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   107 | memtest:investigate:retry                                                                  | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   106 | memtest:investigate:last_good_tests:3a3104f2ab3bc31d94191dc20635f191ef914fe2               | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   105 | jeos-base+sdk+desktop:investigate:retry                                                    | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   105 | jeos-containers-podman:investigate:retry                                                   | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   104 | jeos-fs_stress:investigate:retry                                                           | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   102 | jeos-main:investigate:last_good_tests:62553f401b66a1ec01fa037476113a1a42016150             | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   100 | jeos-extratest:investigate:last_good_tests:f99df70fc4702425fc55668a06d45bc639bf5056        | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
   100 | jeos-fips:investigate:retry                                                                | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
    10 | memtest-poo152578                                                                          | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
     1 | jeos-extratest:investigate:last_good_build:2.37                                            | backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.or
(15 rows)

Real tests could reappear when the corresponding tests are scheduled again, though.

Actions #7

Updated by tinita 4 months ago

  • Status changed from In Progress to Feedback
Actions #8

Updated by okurz 4 months ago

  • Due date set to 2024-01-12
  • Priority changed from High to Normal
  • Target version changed from Ready to Tools - Next

Ok, it can be some time until we get useful feedback so down-prioritizing as you did the good initial steps

Actions #9

Updated by okurz 2 months ago

  • Related to action #76813: [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" added
Actions #10

Updated by okurz 2 months ago

  • Due date deleted (2024-01-12)
  • Status changed from Feedback to Workable
  • Assignee deleted (tinita)
  • Target version changed from Tools - Next to Ready

after 2 months as tinita has confirmed that there was no response in Slack and apparently nobody else asked. We should mob around that issue as a team.

Actions #11

Updated by tinita about 2 months ago

  • Status changed from Workable to Resolved
  • Assignee set to tinita

I had a look again and the last incomplete with that message is from february 19, and the last test is actually passed: https://openqa.suse.de/tests/13062217#next_previous

So I think this can be resolved now.

Actions

Also available in: Atom PDF