Project

General

Profile

action #76813

coordination #109668: [saga][epic] Stable and updated non-qemu backends for SLE validation

coordination #109656: [epic] Stable non-qemu backends

[tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused":retry

Added by mkittler over 1 year ago. Updated 2 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2020-10-30
Due date:
% Done:

0%

Estimated time:
Difficulty:
Tags:

Description

Observation

[2022-01-13T13:57:24.238682+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#76813

Acceptance criteria

  • AC1: It is clear to test reviewers what needs to be done in the next step when reading the reason

Suggestions

This looks like a temporary problem with the svirt host (here openqaw5-xen.qa.suse.de). But it is not clear whether the problem is always temporary and an automatic restart would make sense. I mainly created this ticket so jobs like this one won't show up in auto-review anymore.

  • What workers are affected? Is it openqaworker2 and grenache only? Yes, obviously because only them use the svirt backend this way.
  • Focus on tests trying to connect to a remote machine
  • Add better log messages explaining to test reviewers what the message could mean
  • Make sure the test ends with a proper reason propagated into the job results

Out of scope

  • VNC running on localhost, see #105882 about that

Workaround

Just retry


Related issues

Related to openQA Project - action #104520: Move svirt extract_asset code from os-autoinst-distri-opensuse to os-autoinst/backend/svirt.pm size:M auto_review:"unable to extract assets: Can't call method.+name.+on an undefined value":retryWorkable2021-12-29

Related to openQA Tests - action #99345: [tools][qem] Incomplete test runs on s390x with auto_review:"backend died: Error connecting to VNC server.*s390.*Connection timed out":retry size:MIn Progress2021-09-27

Related to openQA Project - action #107026: Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) size:MResolved2022-02-17

Copied to openQA Project - action #105882: Test using svirt backend fails with auto_review:"Error connecting to VNC server.*localhost.*Connection refused"Resolved2020-10-30

Copied to openQA Project - action #106685: [tools] Test using svirt backend incomplete with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection timed out":retryBlocked

History

#1 Updated by mkittler over 1 year ago

  • Project changed from openQA Infrastructure to openQA Project

#2 Updated by okurz over 1 year ago

  • Tags set to qac, jeos
  • Project changed from openQA Project to openQA Tests
  • Category set to Infrastructure
  • Assignee set to jlausuch

one more candidate for QAC as this is https://progress.opensuse.org/projects/qa/wiki#Out-of-scope for QE Tools :)

#3 Updated by jlausuch over 1 year ago

Not sure we are gonna fix this backend failure, however I keep the ticket to tag failing tests when it happens again.

#4 Updated by okurz over 1 year ago

jlausuch wrote:

Not sure we are gonna fix this backend failure, however I keep the ticket to tag failing tests when it happens again.

Maybe I can help to provide some additional information for you to be able to fix it? If a related failure only happens very sporadically probably it is ok to just have a ticket for reference but for the sake of long-term sustainability I suggest you as a team still invest in improving the overall situation. Likely there is not much of an issue to "fix" but to just improve the feedback from the backend if something like this happens. I mean "Connection refused" means that one thing tries to reach another thing which does not work. To help test reviewers to understand what is going on it could help to find out what are these "things", what are likely causes for this to happen, provide some additional information for debugging, e.g. is the target machine at least reachable? Are all prerequisities fulfilled, etc.

#5 Updated by okurz over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: jeos-base+phub@svirt-xen-pv
https://openqa.suse.de/tests/5303636

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#6 Updated by okurz over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs@64bit-ipmi
https://openqa.suse.de/tests/5433647

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#7 Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs@64bit-ipmi
https://openqa.suse.de/tests/5433647

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#8 Updated by openqa_review about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: jeos-base+phub@svirt-hyperv-uefi
https://openqa.suse.de/tests/5794800

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#9 Updated by tinita 10 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: default@svirt-hyperv
https://openqa.suse.de/tests/6492141

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The label in the openQA scenario is removed

#10 Updated by tinita 10 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: default@svirt-hyperv
https://openqa.suse.de/tests/6492141

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The label in the openQA scenario is removed

#11 Updated by openqa_review 9 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: default@svirt-hyperv-uefi
https://openqa.suse.de/tests/6888454

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The label in the openQA scenario is removed

#12 Updated by openqa_review 8 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs@64bit-ipmi
https://openqa.suse.de/tests/7048944

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The label in the openQA scenario is removed

#13 Updated by openqa_review 8 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: select_modules_and_patterns+registration@svirt-hyperv-uefi
https://openqa.suse.de/tests/7217519

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#14 Updated by openqa_review 7 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: jeos-base+sdk+desktop@svirt-hyperv
https://openqa.suse.de/tests/7362868

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#15 Updated by jlausuch 7 months ago

  • Status changed from New to Workable
  • Assignee deleted (jlausuch)

Some PR that might solve the bootloader hyperv problem, but not the root cause of this issue
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13496

#16 Updated by openqa_review 6 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: jeos-base+sdk+desktop@svirt-hyperv
https://openqa.suse.de/tests/7619895

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#17 Updated by openqa_review 5 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs@64bit-ipmi
https://openqa.suse.de/tests/7784343

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#18 Updated by okurz 5 months ago

  • Project changed from openQA Tests to Containers
  • Category deleted (Infrastructure)

jlausuch https://openqa.suse.de/tests/7619895 is jeos-base+sdk+desktop@svirt-hyperv, so that's in your domain, right?

#19 Updated by jlausuch 5 months ago

  • Project changed from Containers to openQA Tests

okurz wrote:

jlausuch https://openqa.suse.de/tests/7619895 is jeos-base+sdk+desktop@svirt-hyperv, so that's in your domain, right?

A couple of things:
First, that's a jeos job, so it's not "containers" domain :)
Second, It isn't really related to JeOS, as it's a backend issue.
Third, I haven't seen this issue happening since 1 month (at least in JeOS).

#20 Updated by okurz 5 months ago

  • Subject changed from Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" to [qe-core] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused"
  • Category set to Bugs in existing tests

uhm, ok. So https://openqa.suse.de/parent_group_overview/34#grouped_by_build unfortunately does not have a description. I assume it's QE-Core then?

#21 Updated by jlausuch 5 months ago

okurz wrote:

uhm, ok. So https://openqa.suse.de/parent_group_overview/34#grouped_by_build unfortunately does not have a description. I assume it's QE-Core then?

I don't think it "belongs" to any particular squad. It's a backend issue, it belongs (affects) to everyone, so I don't know what to answer :)

#22 Updated by cdywan 4 months ago

I just ran into it:

https://openqa.suse.de/tests/7968548

The second run https://openqa.suse.de/tests/7968555 failed the same way.

Edit: Third https://openqa.suse.de/tests/7968565

And the force https://openqa.suse.de/tests/7968567

Possibly interesting excerpt from the logs, looks to me like there's an I/O issue stalling or crashing VNC.

Couldn't get a file descriptor referring to the console
%@umount: /var/lib/nfs/rpc_pipefs: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)

Main Menu

0) <-- Back <--
1) Start Installation          
2) Settings               
3) Expert                
4) Exit or Reboot      

[2022-01-13T13:57:19.695523+01:00] [debug] no change: 56.9s
[2022-01-13T13:57:22.234352+01:00] [debug] WARNING: check_asserted_screen took 1.54 seconds for 15 candidate needles - make your needles more specific
[2022-01-13T13:57:22.234563+01:00] [debug] no match: 55.9s, best candidate: addon-products-20190923 (0.00)
[2022-01-13T13:57:22.234890+01:00] [debug] considering VNC stalled, no update for 5.07 seconds
[2022-01-13T13:57:24.238682+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
[2022-01-13T13:57:25.240025+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
[2022-01-13T13:57:26.241758+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
[2022-01-13T13:57:27.243136+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
[2022-01-13T13:57:28.244558+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
[2022-01-13T13:57:29.245954+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
[2022-01-13T13:57:30.247334+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
[2022-01-13T13:57:31.248793+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused      

#23 Updated by cdywan 4 months ago

  • Has duplicate action #99345: [tools][qem] Incomplete test runs on s390x with auto_review:"backend died: Error connecting to VNC server.*s390.*Connection timed out":retry size:M added

#24 Updated by cdywan 4 months ago

  • Related to action #104520: Move svirt extract_asset code from os-autoinst-distri-opensuse to os-autoinst/backend/svirt.pm size:M auto_review:"unable to extract assets: Can't call method.+name.+on an undefined value":retry added

#25 Updated by apappas 4 months ago

  • Subject changed from [qe-core] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" to [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused"
  • Status changed from Workable to Feedback

This ticket was discussed in the qe-core refinement session.

  1. Apparently it is a backend issue, and we as a team have neither the mission nor the skills to fix backend issues.
  2. The jobs linked are JeOS jobs, which are definitely not in our domain.

As a result, we ask for feedback from the qe-tools team. If they don't take up the ticket we will reject it.

#26 Updated by okurz 4 months ago

  • Description updated (diff)
  • Status changed from Feedback to New
  • Target version set to Ready

ok. I will pick it up for tools and we will once again see what we can do and check on the scope and responsibility.

#27 Updated by okurz 3 months ago

From today limit=50 openqa-query-for-job-label poo#76813

8089855|2022-02-03 08:56:42|done|incomplete|migration_offline_dvd_sle15sp1_ltss_ha_alpha_node02|backend died: Error connecting to VNC server <localhost:47095>: IO::Socket::INET: connect: Connection refused|grenache-1
8089206|2022-02-03 06:02:49|done|incomplete|textmode_ha_minimal_base|backend died: Error connecting to VNC server <localhost:34795>: IO::Socket::INET: connect: Connection refused|grenache-1
8089197|2022-02-03 05:52:09|done|incomplete|migration_online_sle15sp3_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:45839>: IO::Socket::INET: connect: Connection refused|grenache-1
8089196|2022-02-03 05:51:13|done|incomplete|migration_online_sle15sp2_ha_alpha_node02|backend died: Error connecting to VNC server <localhost:60041>: IO::Socket::INET: connect: Connection refused|grenache-1
8089195|2022-02-03 05:49:51|done|incomplete|migration_online_sle15sp2_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:60589>: IO::Socket::INET: connect: Connection refused|grenache-1
8089178|2022-02-03 05:39:40|done|incomplete|migration_offline_scc_sle15sp2_ha_alpha_node02|backend died: Error connecting to VNC server <localhost:43133>: IO::Socket::INET: connect: Connection refused|grenache-1
8089171|2022-02-03 05:29:21|done|incomplete|migration_offline_scc_sle12sp4_ltss_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:40687>: IO::Socket::INET: connect: Connection refused|grenache-1
8088391|2022-02-03 05:26:32|done|incomplete|gi-guest_developing-on-host_sles15sp3-kvm|backend died: Error connecting to VNC server <localhost:40339>: IO::Socket::INET: connect: Connection refused|grenache-1
8089169|2022-02-03 05:22:35|done|incomplete|migration_offline_scc_sle12sp3_ltss_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:47485>: IO::Socket::INET: connect: Connection refused|grenache-1
8089166|2022-02-03 05:18:20|done|incomplete|ha_textmode_extended|backend died: Error connecting to VNC server <localhost:36727>: IO::Socket::INET: connect: Connection refused|grenache-1
8093592|2022-02-03 05:13:22|done|incomplete|ha_textmode_extended|backend died: Error connecting to VNC server <localhost:51349>: IO::Socket::INET: connect: Connection refused|grenache-1
8093590|2022-02-03 05:08:49|done|incomplete|textmode_ha_minimal_base|backend died: Error connecting to VNC server <localhost:47857>: IO::Socket::INET: connect: Connection refused|grenache-1
8093593|2022-02-03 02:17:10|done|incomplete|create_hdd_ha_textmode|backend died: Error connecting to VNC server <localhost:45515>: IO::Socket::INET: connect: Connection refused|grenache-1
8090209|2022-02-03 02:15:10|done|incomplete|jeos-filesystem|backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused|openqaworker2
8088393|2022-02-03 01:19:09|done|incomplete|gi-guest_sles15sp3-on-host_developing-kvm|backend died: Error connecting to VNC server <localhost:35653>: IO::Socket::INET: connect: Connection refused|grenache-1
8089763|2022-02-03 00:59:16|done|incomplete|virt-guest-migration-sles15sp3-from-sles15sp3-to-developing-xen-dst|backend died: Error connecting to VNC server <localhost:36179>: IO::Socket::INET: connect: Connection refused|openqaworker2
8090393|2022-02-03 00:53:27|done|incomplete|offline_sles15sp1_ltss_pscc_basesys-srv-lgm-pcm_def_full|backend died: Error connecting to VNC server <localhost:48449>: IO::Socket::INET: connect: Connection refused|grenache-1
8089761|2022-02-03 00:50:10|done|incomplete|virt-guest-migration-sles15sp3-from-sles15sp3-to-developing-kvm-dst|backend died: Error connecting to VNC server <localhost:37713>: IO::Socket::INET: connect: Connection refused|openqaworker2
8090379|2022-02-03 00:48:51|done|incomplete|offline_sles15sp3_pscc_base_all_minimal_zVM_preparation|backend died: Error connecting to VNC server <localhost:44575>: IO::Socket::INET: connect: Connection refused|openqaworker2
8089681|2022-02-03 00:40:10|done|incomplete|videomode_text+textmode+role_kvm|backend died: Error connecting to VNC server <localhost:38541>: IO::Socket::INET: connect: Connection refused|grenache-1
8089760|2022-02-03 00:37:49|done|incomplete|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-kvm-dst|backend died: Error connecting to VNC server <localhost:51769>: IO::Socket::INET: connect: Connection refused|openqaworker2
8090387|2022-02-03 00:34:38|done|incomplete|online_sles15sp1_ltss_pscc_basesys-srv-phub_def_full_y|backend died: Error connecting to VNC server <localhost:34933>: IO::Socket::INET: connect: Connection refused|grenache-1
8090383|2022-02-03 00:28:59|done|incomplete|online_sles15sp3_pscc_basesys-srv-lgm-pcm_def_full_zdup|backend died: Error connecting to VNC server <localhost:43337>: IO::Socket::INET: connect: Connection refused|grenache-1
8089756|2022-02-03 00:28:10|done|incomplete|virt-guest-migration-sles12sp5-from-sles12sp5-to-developing-xen-dst|backend died: Error connecting to VNC server <localhost:47225>: IO::Socket::INET: connect: Connection refused|openqaworker2
8090385|2022-02-03 00:23:51|done|incomplete|online_sles15sp1_ltss_pscc_basesys-srv_def_full_zdup|backend died: Error connecting to VNC server <localhost:52153>: IO::Socket::INET: connect: Connection refused|grenache-1
8090381|2022-02-03 00:14:23|done|incomplete|online_sles15sp3_pscc_basesys-srv_def_full_zdup|backend died: Error connecting to VNC server <localhost:53001>: IO::Socket::INET: connect: Connection refused|grenache-1
8090374|2022-02-03 00:08:38|done|incomplete|offline_sles15sp2_ltss_pscc_base_all_minimal|backend died: Error connecting to VNC server <localhost:55321>: IO::Socket::INET: connect: Connection refused|grenache-1
8090295|2022-02-03 00:08:07|done|incomplete|offline_sles12sp3_ltss_media_tcm-wsm_def_full|backend died: Error connecting to VNC server <localhost:41125>: IO::Socket::INET: connect: Connection refused|grenache-1
8090371|2022-02-02 23:57:48|done|incomplete|online_sles15sp2_ltss_pscc_basesys-srv-desk-dev_def_full_y|backend died: Error connecting to VNC server <localhost:37459>: IO::Socket::INET: connect: Connection refused|grenache-1
8089677|2022-02-02 23:56:52|done|incomplete|textmode|backend died: Error connecting to VNC server <localhost:60019>: IO::Socket::INET: connect: Connection refused|grenache-1
8089675|2022-02-02 23:54:00|done|incomplete|default|backend died: Error connecting to VNC server <localhost:39781>: IO::Socket::INET: connect: Connection refused|openqaworker2
8090368|2022-02-02 23:51:31|done|incomplete|online_sles15sp2_ltss_pscc_basesys-srv-wsm-contm_all_full_y|backend died: Error connecting to VNC server <localhost:44857>: IO::Socket::INET: connect: Connection refused|grenache-1
8089739|2022-02-02 23:34:39|done|incomplete|prj4_guest_upgrade_sles15sp3_on_developing-xen|backend died: Error connecting to VNC server <localhost:33053>: IO::Socket::INET: connect: Connection refused|grenache-1
8090294|2022-02-02 23:30:54|done|incomplete|offline_sles12sp3_ltss_pscc_asmm-lgm_all_full|backend died: Error connecting to VNC server <localhost:49109>: IO::Socket::INET: connect: Connection refused|grenache-1
8089448|2022-02-02 23:19:38|done|incomplete|uefi-gi-guest_developing-on-host_developing-kvm|backend died: Error connecting to VNC server <localhost:32997>: IO::Socket::INET: connect: Connection refused|grenache-1
8089774|2022-02-02 22:48:45|done|incomplete|gi-guest_sles15sp3-on-host_developing-kvm|backend died: Error connecting to VNC server <localhost:60419>: IO::Socket::INET: connect: Connection refused|grenache-1
8089207|2022-02-02 22:37:06|done|incomplete|allmodules+allpatterns-gnome|backend died: Error connecting to VNC server <10.161.145.85:5901>: IO::Socket::INET: connect: Connection refused|grenache-1
8088389|2022-02-02 22:32:45|done|incomplete|gi-guest_developing-on-host_developing-kvm|backend died: Error connecting to VNC server <localhost:46107>: IO::Socket::INET: connect: Connection refused|grenache-1
8089378|2022-02-02 22:22:38|done|incomplete|gi-guest_win2019-on-host_developing-xen|backend died: Error connecting to VNC server <localhost:49681>: IO::Socket::INET: connect: Connection refused|grenache-1
8089780|2022-02-02 22:19:48|done|incomplete|virt-guest-migration-developing-from-developing-to-developing-xen-src|backend died: Error connecting to VNC server <localhost:46003>: IO::Socket::INET: connect: Connection refused|openqaworker2
8089778|2022-02-02 22:18:25|done|incomplete|virt-guest-migration-developing-from-developing-to-developing-kvm-src|backend died: Error connecting to VNC server <localhost:42537>: IO::Socket::INET: connect: Connection refused|openqaworker2
8089769|2022-02-02 22:17:27|done|incomplete|gi-guest_developing-on-host_developing-xen|backend died: Error connecting to VNC server <localhost:47373>: IO::Socket::INET: connect: Connection refused|grenache-1
8088147|2022-02-02 22:12:32|done|incomplete|migration_offline_dvd_sle15sp1_ltss_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:56609>: IO::Socket::INET: connect: Connection refused|grenache-1
8088144|2022-02-02 22:12:10|done|incomplete|migration_offline_dvd_sle15sp2_ha_alpha_node02|backend died: Error connecting to VNC server <localhost:60737>: IO::Socket::INET: connect: Connection refused|grenache-1
8089300|2022-02-02 22:11:55|done|incomplete|ssh-X|backend died: Error connecting to VNC server <localhost:39093>: IO::Socket::INET: connect: Connection refused|openqaworker2
8088150|2022-02-02 22:11:46|done|incomplete|migration_offline_dvd_sle12sp4_ltss_ha_alpha_node02|backend died: Error connecting to VNC server <localhost:57655>: IO::Socket::INET: connect: Connection refused|grenache-1
8088143|2022-02-02 22:11:45|done|incomplete|migration_offline_dvd_sle15sp3_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:50937>: IO::Socket::INET: connect: Connection refused|grenache-1
8088177|2022-02-02 22:09:37|done|incomplete|migration_offline_dvd_sle12sp4_ltss_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:35093>: IO::Socket::INET: connect: Connection refused|grenache-1
8088176|2022-02-02 22:09:18|done|incomplete|migration_offline_dvd_sle15sp3_ha_alpha_node02|backend died: Error connecting to VNC server <localhost:56829>: IO::Socket::INET: connect: Connection refused|grenache-1
8088175|2022-02-02 22:08:17|done|incomplete|migration_offline_dvd_sle15sp2_ha_alpha_node01|backend died: Error connecting to VNC server <localhost:41807>: IO::Socket::INET: connect: Connection refused|grenache-openqa-query-for-job-label poo#76813

so multiple machines, many jobs reproduce this every day.

#28 Updated by cdywan 3 months ago

  • Subject changed from [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" to [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" size:M
  • Description updated (diff)
  • Status changed from New to Workable

#29 Updated by okurz 3 months ago

  • Subject changed from [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" size:M to [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused"
  • Category deleted (Bugs in existing tests)
  • Status changed from Workable to New
  • Target version deleted (Ready)

Actually we agreed in the estimation meeting that we did not manage to estimate it together so removing the estimate and workable status again.

#30 Updated by okurz 3 months ago

  • Copied to action #105882: Test using svirt backend fails with auto_review:"Error connecting to VNC server.*localhost.*Connection refused" added

#31 Updated by okurz 3 months ago

  • Project changed from openQA Tests to openQA Project
  • Description updated (diff)
  • Category set to Feature requests
  • Target version set to Ready

#32 Updated by okurz 3 months ago

  • Copied to action #106685: [tools] Test using svirt backend incomplete with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection timed out":retry added

#33 Updated by openqa_review 3 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: autoyast_reinstall
https://openqa.suse.de/tests/8180537

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#34 Updated by mkittler 3 months ago

  • Assignee set to mkittler

Looks like the OS is supposed to provide this VNC server as configured via the kernel command line:

    <os>
      <type>hvm</type>
      <initrd>/var/lib/libvirt/images/openQA-SUT-3.initrd</initrd>
      <kernel>/var/lib/libvirt/images/openQA-SUT-3.kernel</kernel>
      <cmdline>ifcfg=dhcp install=ftp://openqa.suse.de/SLE-15-SP4-Online-s390x-Build98.1-Media1  sshd=1 VNC=1 VNCSize=1024x768 VNCPassword=nots3cr3t sshpassword=nots3cr3t  autoyast=http://10.162.6.237:20333/NaB1iG65KSvSwjkj/assets/other/08180440-autoinst.xml SetHostname=0   regurl=http://all-98.1.proxy.scc.suse.de</cmdline>
    </os>
[    0.175007][    T0] Kernel command line: ifcfg=dhcp install=ftp://openqa.suse.de/SLE-15-SP4-Online-s390x-Build98.1-Media1  sshd=1 VNC=1 VNCSize=1024x768 VNCPassword=nots3cr3t sshpassword=nots3cr3t  autoyast=http://10.162.6.237:20333/NaB1iG65KSvSwjkj/assets/other/08180440-autoinst.xml SetHostname=0   regurl=http://all-98.1.proxy.scc.suse.de
[    0.175082][    T0] Unknown kernel command line parameters "ifcfg=dhcp install=ftp://openqa.suse.de/SLE-15-SP4-Online-s390x-Build98.1-Media1 sshd=1 VNC=1 VNCSize=1024x768 VNCPassword=nots3cr3t sshpassword=nots3cr3t autoyast=http://10.162.6.237:20333/NaB1iG65KSvSwjkj/assets/other/08180440-autoinst.xml SetHostname=0 regurl=http://all-98.1.proxy.scc.suse.de", will be passed to user space.

And it looks like it would actually start that VNC server at some point:

Loading Installation System (6/6) -            0% 34% 60%100%
BAD PASSWORD: it is based on a dictionary word
starting rsyslogd (logging to /dev/tty4)... ok
starting klogd... ok
starting nscd... ok
setting temporary root password to 'nots3cr3t'
BAD PASSWORD: it is based on a dictionary word
ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519 
Starting SSH daemon... ok
IP addresses:
  10.161.145.82
starting slpd to announce VNC...
starting yast...
Starting Installer
%GProbing connected terminal...
78
Initializing virtual console...

Found an unknown terminal on /dev/console (80 columns x 24 lines).
Sampling every 5 s to /var/log/YaST2/memsample.zcat

starting VNC server...
A log file will be written to: /var/log/YaST2/vncserver.log ...

***
***           You can connect to <host>, display :1 now with vncviewer
***           Or use a Java capable browser on http://<host>:5801/
***

(When YaST2 is finished, close your VNC viewer and return to this window.)

Active interfaces:

eth0: 52:54:00:45:1d:43
  10.161.145.82/20
  fe80::5054:ff:fe45:1d43/64

*** Starting YaST2 ***

Then it looks like it is able to connect (still within the bootloader_start module):

[2022-02-17T11:26:12.837010+01:00] [debug] >>> testapi::wait_serial:  Starting YaST2 : ok
[2022-02-17T11:26:12.837241+01:00] [debug] tests/installation/bootloader_start.pm:48 called bootloader_zkvm::run -> tests/installation/bootloader_zkvm.pm:86 called testapi::select_console
[2022-02-17T11:26:12.837362+01:00] [debug] <<< testapi::select_console(testapi_console="installation")
[2022-02-17T11:26:12.838539+01:00] [debug] Establishing VNC connection to 10.161.145.82:5901
[2022-02-17T11:26:12.918091+01:00] [debug] led state 0 0 0 -261
[2022-02-17T11:26:12.926806+01:00] [debug] activate_console, console: installation, type: 
[2022-02-17T11:26:12.926912+01:00] [debug] activate_console called with generic type, no action
[2022-02-17T11:26:12.927185+01:00] [debug] tests/installation/bootloader_start.pm:48 called bootloader_zkvm::run -> tests/installation/bootloader_zkvm.pm:86 called testapi::select_console -> lib/susedistribution.pm:904 called testapi::assert_screen
[2022-02-17T11:26:12.927336+01:00] [debug] <<< testapi::assert_screen(mustmatch="installation", no_wait=1, timeout=30)

Then a check screen takes quite long and a VNC stall is detected:

[2022-02-17T11:29:45.752887+01:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 12.92 seconds for 134 candidate needles - make your needles more specific
[2022-02-17T11:29:45.753026+01:00] [debug] no match: 25.0s, best candidate: manual-intervention-20210719 (0.60)
[2022-02-17T11:29:45.759412+01:00] [warn] !!! backend::baseclass::do_capture: There is some problem with your environment, we detected a stall for 12.9358730316162 seconds
[2022-02-17T11:29:46.955969+01:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 1.19 seconds for 134 candidate needles - make your needles more specific
[2022-02-17T11:29:46.956108+01:00] [debug] no match: 12.0s, best candidate: manual-intervention-20210719 (0.56)
[2022-02-17T11:30:00.022348+01:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 13.06 seconds for 134 candidate needles - make your needles more specific
[2022-02-17T11:30:00.022498+01:00] [debug] no match: 10.8s, best candidate: manual-intervention-20210719 (0.60)
[2022-02-17T11:30:00.026329+01:00] [warn] !!! backend::baseclass::do_capture: There is some problem with your environment, we detected a stall for 13.0687019824982 seconds
[2022-02-17T11:30:13.001954+01:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 12.97 seconds for 134 candidate needles - make your needles more specific
[2022-02-17T11:30:13.002110+01:00] [debug] no match: -2.3s, best candidate: manual-intervention-20210719 (0.60)
[2022-02-17T11:30:13.292796+01:00] [debug] >>> testapi::_check_backend_response: match=autoyast-boot,autoyast-error,autoyast-stage1-reboot-upcoming,bios-boot,import-untrusted-gpg-key,linuxrc-install-fail,manual-intervention,nonexisting-package,nvidia-validation-failed,package-notification,reboot-after-installation,scc-invalid-url,warning-pop-up timed out after 50 (check_screen)
[2022-02-17T11:30:13.327518+01:00] [warn] !!! testapi::_check_backend_response: stall detected during check_screen failure!
[2022-02-17T11:30:13.338752+01:00] [debug] tests/autoyast/installation.pm:178 called installation::verify_timeout_and_check_screen -> tests/autoyast/installation.pm:103 called testapi::check_screen
[2022-02-17T11:30:13.339084+01:00] [debug] <<< testapi::check_screen(mustmatch=[
    "bios-boot",
    "nonexisting-package",
    "reboot-after-installation",
    "linuxrc-install-fail",
    "scc-invalid-url",
    "warning-pop-up",
    "autoyast-boot",
    "package-notification",
    "nvidia-validation-failed",
    "import-untrusted-gpg-key",
    "autoyast-error",
    "autoyast-stage1-reboot-upcoming",
    "manual-intervention"
  ], timeout=50)
[2022-02-17T11:30:15.389680+01:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 2.05 seconds for 134 candidate needles - make your needles more specific
[2022-02-17T11:30:15.389822+01:00] [debug] no match: 49.9s, best candidate: manual-intervention-20210719 (0.56)

This happens multiple times:

[2022-02-17T11:36:41.374822+01:00] [warn] !!! backend::baseclass::do_capture: There is some problem with your environment, we detected a stall for 12.8661558628082 seconds

Then we see log messages indicating a reboot - so the system was actually just rebooting (which would explain the VNC stall):

[    3.058782][    T1] systemd[1]: Detected virtualization kvm.

[    3.058787][    T1] systemd[1]: Detected architecture s390x.



Welcome to SUSE Linux Enterprise Server 15 SP4!

It also tries to start the VNC server again. However, that doesn't work because the address is already in use:

***
***  Please return to your X-Server screen to finish installation
***


starting VNC server...
A log file will be written to: /var/log/YaST2/vncserver.log ...

***
***           You can connect to <host>, display :1 now with vncviewer
***           Or use a Java capable browser on http://<host>:5801/
***

(When YaST2 is finished, close your VNC viewer and return to this window.)

Active interfaces:

eth0: 52:54:00:45:1d:43
  10.161.145.82/20
  fe80::5054:ff:fe45:1d43/64

*** Starting YaST2 ***
2022-02-17T05:41:40.397594-05:00 susetest systemd[6068]: xvnc.socket: Failed to create listening socket ([::]:5901): Address already in use
[FAILED] Failed to listen on Xvnc Server.
2022-02-17T05:41:40.407537-05:00 susetest systemd[1]: xvnc.socket: Failed to receive listening socket ([::]:5901): Input/output error
2022-02-17T05:41:40.407645-05:00 susetest systemd[1]: Failed to listen on Xvnc Server.
removed '/root/.vnc/passwd.yast'
%@


Welcome to SUSE Linux Enterprise Server 15 SP4 Snapshot-202202-3 (s390x) - Kernel 5.14.21-150400.7-default (ttysclp0).

That os-autoinst fails to re-connect is therefore no surprise:

[2022-02-17T11:38:42.215876+01:00] [debug] <<< testapi::select_console(testapi_console="x11", await_console=0)
[2022-02-17T11:38:42.216437+01:00] [debug] Establishing VNC connection to 10.161.145.82:5901
[2022-02-17T11:38:44.221660+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:45.224877+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:46.226871+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:47.228471+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:48.231267+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:49.232907+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:50.234302+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:51.236243+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host
[2022-02-17T11:38:52.365064+01:00] [info] ::: basetest::runtest: # Test died: Error connecting to VNC server <10.161.145.82:5901>: IO::Socket::INET: connect: No route to host at /usr/lib/os-autoinst/testapi.pm line 1763.
    testapi::select_console("x11", "await_console", 0) called at sle/lib/utils.pm line 1482
    utils::reconnect_mgmt_console("timeout", 700, "grub_timeout", 180) called at sle/tests/autoyast/installation.pm line 323
    installation::run(installation=HASH(0x1000d4c3c18)) called at /usr/lib/os-autoinst/basetest.pm line 360
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 354
    basetest::runtest(installation=HASH(0x1000d4c3c18)) called at /usr/lib/os-autoinst/autotest.pm line 372
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 372
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 242
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 242
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 296
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x1000e24ea88)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x1000e24ea88), CODE(0x1000f5b1de0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x1000e24ea88)) called at /usr/lib/os-autoinst/autotest.pm line 298

I've also checked some other jobs and they look similar.

#35 Updated by mkittler 3 months ago

Strangely, the last good job (https://openqa.suse.de/tests/8014876) has the same error:

starting VNC server...
A log file will be written to: /var/log/YaST2/vncserver.log ...

***
***           You can connect to <host>, display :1 now with vncviewer
***           Or use a Java capable browser on http://<host>:5801/
***

(When YaST2 is finished, close your VNC viewer and return to this window.)

Active interfaces:

eth0: 52:54:00:84:f8:77
  10.161.145.86/20
  fe80::5054:ff:fe84:f877/64

*** Starting YaST2 ***
2022-01-21T15:48:14.257648-05:00 susetest systemd[6058]: xvnc.socket: Failed to create listening socket ([::]:5901): Address already in use
[FAILED] Failed to listen on Xvnc Server.
2022-01-21T15:48:14.268804-05:00 susetest systemd[1]: xvnc.socket: Failed to receive listening socket ([::]:5901): Input/output error
2022-01-21T15:48:14.268915-05:00 susetest systemd[1]: Failed to listen on Xvnc Server.
removed '/root/.vnc/passwd.yast'
%@


Welcome to SUSE Linux Enterprise Server 15 SP4 Beta3-202201 (s390x) - Kernel 5.14.21-150400.4-default (ttysclp0).

So this is likely a wrong lead. In the good run, the VNC connection can be established despite this error:

eth0: 10.161.145.86 fe80::5054:ff:fe84:f877


susetest login: [2022-01-21T21:45:22.857260+01:00] [debug] >>> testapi::wait_serial: (?^u:Welcome to SUSE Linux Enterprise .*\(s390x\)): ok
[2022-01-21T21:45:22.858722+01:00] [debug] tests/autoyast/installation.pm:323 called utils::reconnect_mgmt_console -> lib/utils.pm:1453 called testapi::select_console
[2022-01-21T21:45:22.858907+01:00] [debug] <<< testapi::select_console(testapi_console="x11", await_console=0)
[2022-01-21T21:45:22.859517+01:00] [debug] Establishing VNC connection to 10.161.145.86:5901
[2022-01-21T21:45:24.077348+01:00] [debug] led state 0 0 0 -261
[2022-01-21T21:45:24.086722+01:00] [debug] activate_console, console: x11, type: 
[2022-01-21T21:45:24.086846+01:00] [debug] activate_console called with generic type, no action
[2022-01-21T21:45:24.211842+01:00] [debug] ||| finished installation autoyast (runtime: 821 s)
[2022-01-21T21:45:24.229569+01:00] [debug] ||| starting console tests/autoyast/console.pm
[2022-01-21T21:45:24.230706+01:00] [debug] tests/autoyast/console.pm:18 called testapi::select_console
[2022-01-21T21:45:24.230833+01:00] [debug] <<< testapi::select_console(testapi_console="root-console")
[2022-01-21T21:45:24.240771+01:00] [debug] Establishing VNC connection to localhost:56165
[2022-01-21T21:45:25.393453+01:00] [debug] Connected to Xvnc - PID 675439
icewm PID is 675520
[2022-01-21T21:45:26.402275+01:00] [debug] Wait for SSH on host 10.161.145.86 (timeout: 120)
xterm PID is 675522
[2022-01-21T21:45:26.466387+01:00] [debug] led state 0 0 0 -261
[2022-01-21T21:45:26.474955+01:00] [debug] activate_console, console: root-console, type: console
[2022-01-21T21:45:26.475086+01:00] [debug] backend ipmi || spvm || pvm_hmc || s390x || zkvm
[2022-01-21T21:45:26.475306+01:00] [debug] tests/autoyast/console.pm:18 called testapi::select_console -> lib/susedistribution.pm:748 called susedistribution::handle_password_prompt -> lib/susedistribution.pm:48 called testapi::assert_screen
[2022-01-21T21:45:26.475465+01:00] [debug] <<< testapi::assert_screen(mustmatch="password-prompt", timeout=60)

#36 Updated by mkittler 3 months ago

Maybe the failing VNC connection isn't used anyways? In other words: Maybe in the good case the lingering VNC server is lingering long enough to allow the re-connect but actually the test does not need this re-connect attempt anyways? This would make sense considering it is also establishing a connection to the "worker-local" Xvnc server (and supposedly uses that VNC server instead of the SUT-provided one). This would mean that the test should simply remove the VNC console if it isn't used anymore.

Even if that's not the case I'm wondering what we can do from os-autoinst. If the test really needs this SUT-provided VNC server the itself test must somehow ensure that it can be started.

#37 Updated by mkittler 3 months ago

  • Status changed from New to Feedback

#38 Updated by mkittler 3 months ago

I overlooked one thing: The VNC port differs. The VNC-server in the SUT is using port 5801 but the connection attempt is to port 5901. The port also differs in the good case. So am I right to assume that this has nothing to do with the SUT-provided VNC after all?

#39 Updated by mkittler 3 months ago

I suppose the 10.161.145.x IP is the svrit jumphost, considering that the x86_64 jobs actually use a domain instead (openqaw5-xen.qa.suse.de) which is definitely the svirt jumphost.

So this is mainly a problem for svirt/x390x and svirt/xen/x86_64:

openqa=> select id, t_finished, test, (select value from job_settings where job_id = jobs.id and key = 'BACKEND') as backend, arch, reason from jobs where reason like '%Error connecting to VNC server%<10%refused%' order by id desc limit 40;
   id    |     t_finished      |                                   test                                   | backend | arch  |                                                      reason                                                      
---------+---------------------+--------------------------------------------------------------------------+---------+-------+------------------------------------------------------------------------------------------------------------------
 8226548 | 2022-02-25 01:35:39 | qam-gnome                                                                | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.95:5901>: IO::Socket::INET: connect: Connection refused
 8220467 | 2022-02-24 01:49:45 | qam-gnome                                                                | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.99:5901>: IO::Socket::INET: connect: Connection refused
 8194332 | 2022-02-19 01:26:11 | qam-gnome                                                                | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.93:5901>: IO::Socket::INET: connect: Connection refused
 8158377 | 2022-02-14 10:39:51 | minimal_x                                                                | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.99:5901>: IO::Socket::INET: connect: Connection refused
 8127731 | 2022-02-09 20:09:13 | offline_sles12sp5_pscc_base_all_minimal:investigate:last_good_build:81.1 | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
 8118684 | 2022-02-08 10:48:37 | qam-gnome                                                                | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.97:5901>: IO::Socket::INET: connect: Connection refused
 8089207 | 2022-02-02 22:37:06 | allmodules+allpatterns-gnome                                             | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.85:5901>: IO::Socket::INET: connect: Connection refused
 7983693 | 2022-01-17 10:49:23 | migration_offline_scc_sle12sp4_ltss_ha_alpha_node02                      | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.80:5901>: IO::Socket::INET: connect: Connection refused
 7911848 | 2021-12-27 10:51:45 | offline_sles15sp2_ltss_media_basesys-srv-desk-dev_def_full               | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.90:5901>: IO::Socket::INET: connect: Connection refused
 7909289 | 2021-12-27 02:59:43 | offline_sles15sp3_media_basesys-srv-desk-dev-contm-lgm-tsm-wsm_all_full  | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection refused
 7898028 | 2021-12-23 08:39:05 | offline_sles12sp5_media_sdk-asmm-contm-lgm-tcm-wsm_all_full              | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.99:5901>: IO::Socket::INET: connect: Connection refused
(11 Zeilen)
openqa=> select id, t_finished, test, (select value from job_settings where job_id = jobs.id and key = 'BACKEND') as backend, arch, reason from jobs where reason like '%Error connecting to VNC server%<op%refused%' order by id desc limit 40;
   id    |     t_finished      |                                 test                                  | backend |  arch  |                                                            reason                                                            
---------+---------------------+-----------------------------------------------------------------------+---------+--------+------------------------------------------------------------------------------------------------------------------------------
 8208649 | 2022-02-22 08:13:41 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8208647 | 2022-02-22 08:13:29 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 8204491 | 2022-02-21 15:46:47 | jeos-extratest@jlausuch/os-autoinst-distri-opensuse#suseconnect_retry | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8202950 | 2022-02-21 15:46:48 | jeos-extratest:investigate:last_good_build:20220220-1                 | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 8167219 | 2022-02-15 01:36:31 | jeos-main                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8164693 | 2022-02-14 21:48:08 | ext4_yast@jknphy/os-autoinst-distri-opensuse#poc_enh_yaml_sch         | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8164561 | 2022-02-14 21:44:01 | ext4_yast@jknphy/os-autoinst-distri-opensuse#poc_enh_yaml_sch         | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8149970 | 2022-02-11 23:54:22 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8149146 | 2022-02-11 23:55:17 | jeos-containers:investigate:last_good_build:20220210-1                | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 8124320 | 2022-02-09 09:18:43 | jeos-main                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8122027 | 2022-02-09 09:18:29 | jeos-extratest                                                        | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 8117896 | 2022-02-08 11:09:01 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8111540 | 2022-02-06 10:59:41 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 8110996 | 2022-02-06 10:59:23 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8090209 | 2022-02-03 02:15:10 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8034373 | 2022-01-26 17:36:50 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 8034370 | 2022-01-26 17:36:46 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8033344 | 2022-01-26 09:55:19 | jeos-main                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen-1.qa.suse.de:5919>: IO::Socket::INET: connect: Connection refused
 8031298 | 2022-01-26 11:46:24 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8031279 | 2022-01-26 11:46:36 | jeos-extratest                                                        | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8030832 | 2022-01-26 00:42:56 | jeos-fips                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8030822 | 2022-01-26 00:42:56 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8007455 | 2022-01-21 07:59:18 | jeos-extratest                                                        | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8007453 | 2022-01-21 07:59:05 | jeos-extratest                                                        | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 7996983 | 2022-01-19 04:28:29 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 7995973 | 2022-01-18 21:29:31 | lvm+RAID1                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 7995972 | 2022-01-18 21:29:31 | ext4                                                                  | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 7983670 | 2022-01-17 09:11:06 | jeos-main                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 7983666 | 2022-01-17 09:06:58 | jeos-fs_stress                                                        | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 7983665 | 2022-01-17 09:06:09 | jeos-fips                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 7983664 | 2022-01-17 09:06:04 | jeos-fips                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 7983663 | 2022-01-17 09:03:47 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 7983661 | 2022-01-17 09:03:48 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 7983660 | 2022-01-17 09:03:47 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 7977963 | 2022-01-15 21:58:10 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 7977962 | 2022-01-15 21:58:11 | jeos-fs_stress                                                        | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 7977961 | 2022-01-15 21:57:41 | jeos-main                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 7977949 | 2022-01-15 21:55:25 | jeos-fips                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 7977948 | 2022-01-15 21:42:21 | jeos-fips                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 7977944 | 2022-01-15 21:46:14 | jeos-containers                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
(40 Zeilen)

That means my previous conclusion that it is about the SUT-provided VNC connection is wrong (also because that VNC server runs on a different port anyways). That also explains why the error about it (that also shows up in passing jobs) is not relevant here.

So either the svirt host isn't reachable anymore in these cases or the virtualization tool (I suppose QEMU in case of s390s and xen in case of x86_64) stopped the VNC server (maybe as part of stopping the VM itself).

#40 Updated by mkittler 3 months ago

Not all those failures look the same if you have a close look at the logs. E.g. in https://openqa.suse.de/tests/8226548 the connection stops working with a VNC protocol error during the installation:

[2022-02-25T02:21:17.477183+01:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.62 seconds for 124 candidate needles - make your needles more specific
[2022-02-25T02:21:17.477325+01:00] [debug] no match: 19.2s, best candidate: manual-intervention2-20170718 (0.37)
[2022-02-25T02:34:59.441635+01:00] [warn] !!! consoles::VNC::catch {...} : Error in VNC protocol - relogin: short read for zrle data 893 - 1524
[2022-02-25T02:35:33.553842+01:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  Error connecting to VNC server <10.161.145.95:5901>: IO::Socket::INET: connect: Connection refused
[2022-02-25T02:35:33.554512+01:00] [debug] Closing SSH serial connection with s390zp19.suse.de

In the case from #76813#note-34 it looks more like the VNC connection is re-established after a stall was detected.


Btw, in the logs of svirt/x86_64/xen job https://openqa.suse.de/tests/8208649 we can clearly see that the VNC server is supposed to be provided by libvirt/xen as the corresponding config is logged:

      <graphics type="vnc" port="5902" autoport="no" listen="0.0.0.0" sharePolicy="force-shared" passwd="nots3cr3t">
        <listen type="address" address="0.0.0.0"/>
      </graphics>

In this case it is also definitely the VNC stall detection which triggers the re-connect:

[2022-02-22T09:13:13.523577+01:00] [debug] <<< testapi::type_string(string="(echo 1spU0; bash -eox pipefail /tmp/script1spU0.sh ; echo SCRIPT_FINISHED1spU0-\$?-) | tee /dev/ttyS0\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-02-22T09:13:17.386004+01:00] [debug] tests/jeos/firstrun.pm:184 called firstrun::verify_mounts -> tests/jeos/firstrun.pm:54 called testapi::script_output
[2022-02-22T09:13:17.386286+01:00] [debug] <<< testapi::wait_serial(record_output=1, timeout=90, quiet=undef, buffer_size=undef, no_regex=0, regexp="SCRIPT_FINISHED1spU0-\\d+-", expect_not_found=0)
[2022-02-22T09:13:17.880381+01:00] [debug] considering VNC stalled, no update for 4.31 seconds
[2022-02-22T09:13:19.884068+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:20.885054+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:21.886534+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:22.887663+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:23.889156+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:24.890710+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:25.891860+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:26.893287+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:27.898708+01:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[2022-02-22T09:13:27.899420+01:00] [debug] Closing SSH serial connection with openqaw5-xen.qa.suse.de
[2022-02-22T09:13:27.899830+01:00] [debug] Destroying openQA-SUT-2 virtual machine

The logs of svirt/s390x don't show such a VNC configuration on VM-level (there is only a cmdline for spawning a VNC server within the SUT but that's not the problematic VNC server here). E.g. that's the entire virsh config for https://openqa.suse.de/tests/8226548:

  <domain type='kvm' id='47356'>
    <name>openQA-SUT-6</name>
    <uuid>d6166688-f8d6-4b34-be45-054500996ff8</uuid>
    <description>openQA WebUI: openqa.suse.de (6): 08226548-sle-12-SP5-Server-DVD-Updates-s390x-Build20220225-1-qam-gnome@s390x-kvm-sle12</description>
    <memory unit='KiB'>2097152</memory>
    <currentMemory unit='KiB'>2097152</currentMemory>
    <vcpu placement='static'>1</vcpu>
    <resource>
      <partition>/machine</partition>
    </resource>
    <os>
      <type arch='s390x' machine='s390-ccw-virtio-3.1'>hvm</type>
      <kernel>/var/lib/libvirt/images/openQA-SUT-6.kernel</kernel>
      <initrd>/var/lib/libvirt/images/openQA-SUT-6.initrd</initrd>
      <cmdline>ifcfg=dhcp install=ftp://openqa.suse.de/SLE-12-SP5-Server-DVD-s390x-GM-DVD1  sshd=1 VNC=1 VNCSize=1024x768 VNCPassword=nots3cr3t sshpassword=nots3cr3t  autoyast=http://10.162.6.237:20453/_nIfx2cAt1O6Ae9N/files/autoyast_qam/12-SP5_installation.xml SetHostname=0   regurl=https://scc.suse.com</cmdline>
      <boot dev='hd'/>
    </os>
    <features>
      <acpi/>
      <apic/>
      <pae/>
    </features>
    <clock offset='utc'/>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>destroy</on_crash>
    <devices>
      <emulator>/usr/bin/qemu-system-s390x</emulator>
      <disk type='file' device='disk'>
        <driver name='qemu' type='qcow2' cache='unsafe'/>
        <source file='/var/lib/libvirt/images/openQA-SUT-6a.img'/>
        <backingStore/>
        <target dev='vda' bus='virtio'/>
        <alias name='virtio-disk0'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
      </disk>
      <controller type='pci' index='0' model='pci-root'>
        <alias name='pci.0'/>
      </controller>
      <controller type='virtio-serial' index='0'>
        <alias name='virtio-serial0'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0002'/>
      </controller>
      <interface type='direct'>
        <mac address='52:54:00:55:e3:d2'/>
        <source dev='eth0' mode='bridge'/>
        <target dev='macvtap4'/>
        <model type='virtio'/>
        <alias name='net0'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
      </interface>
      <console type='pty' tty='/dev/pts/2'>
        <source path='/dev/pts/2'/>
        <target type='sclp' port='0'/>
        <alias name='console0'/>
      </console>
      <console type='pty'>
        <source path='/dev/pts/22'/>
        <target type='virtio' port='1'/>
        <alias name='console1'/>
      </console>
      <memballoon model='virtio'>
        <alias name='balloon0'/>
        <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0003'/>
      </memballoon>
      <panic model='s390'/>
    </devices>
  </domain>

That makes also sense as I couldn't found an add_vnc call in s390s test code but leaves me wondering where the VNC server on 10.161.145.95:5901 is configured. It definitely exists as the first connection to it can be established:

[2022-02-25T02:17:10.371517+01:00] [debug] Establishing VNC connection to 10.161.145.95:5901
[2022-02-25T02:17:10.455881+01:00] [debug] activate_console, console: installation, type: 
[2022-02-25T02:17:10.455973+01:00] [debug] activate_console called with generic type, no action
[2022-02-25T02:17:10.456175+01:00] [debug] tests/installation/bootloader_start.pm:48 called bootloader_zkvm::run -> tests/installation/bootloader_zkvm.pm:86 called testapi::select_console -> lib/susedistribution.pm:904 called testapi::assert_screen
[2022-02-25T02:17:10.456287+01:00] [debug] <<< testapi::assert_screen(mustmatch="installation", timeout=30, no_wait=1)
[2022-02-25T02:17:11.282043+01:00] [debug] >>> testapi::_handle_found_needle: found inst-console-20191204, similarity 1.00 @ 71/49

It is definitely not the local XVnc because that one is also established (way before):

[2022-02-25T02:15:52.977641+01:00] [debug] Establishing VNC connection to localhost:44977
[2022-02-25T02:15:53.118048+01:00] [debug] Connected to Xvnc - PID 142729

It is also not the SUT provided one because that one uses a different port:

starting VNC server...
A log file will be written to: /var/log/YaST2/vncserver.log ...

***
***           You can connect to <host>, display :1 now with vncviewer
***           Or use a Java capable browser on http://<host>:5801/

I suspect in the s390x case the VNC server on port 5901 is simply present by default and added by the not svirt/s390x specific code in opensuse/lib/susedistribution.pm (where 5901 is hard-coded).


So likely all tests which use the svirt backend's add_vnc function to access the VM's VNC are affected (done in bootloader_svirt module) and svirt/s390x tests where the VNC console is configured in more mysterious ways.

Now that it is somewhat clear which VNC server breaks during the test execution (the one configured on VM-level) the question remains why that happens. The VNC server must be broken/unreachable for more than 10 seconds because our code tries to re-connect for 10 seconds.

It seems to happen quite randomly, except for the s390 cases where it happens often in the installation module (e.g. https://openqa.suse.de/tests/8226548 and previous jobs).

There's special code within the VNC backend to destroy the VM in case of a reboot. (You can see the related XML in the config above.) However, this code is actually supposed to fix this problem and since the problem doesn't seem to correspond always to restarts it is unlikely to be the cause.

#41 Updated by mkittler 2 months ago

As next steps we could:

#42 Updated by mkittler 2 months ago

I scheduled a bunch of s390x jobs: for i in {01..50} ; do openqa-clone-job --within-instance https://openqa.suse.de --skip-chained-deps 8226548 TEST=$USER_poo76813_s390x_$i BUILD=poo76813_s390x_investigation _GROUP=0 ; done -> https://openqa.suse.de/tests/overview?build=poo76813_s390x_investigation&version=12-SP5&distri=sle

And a bunch of xen jobs: for i in {01..50} ; do openqa-clone-job --within-instance https://openqa.suse.de --skip-chained-deps 8208649 TEST=$USER_poo76813_xen_$i BUILD=poo76813_xen_investigation _GROUP=0 ; done -> https://openqa.suse.de/tests/overview?distri=sle&build=poo76813_xen_investigation&version=15-SP4

#43 Updated by mkittler 2 months ago

All xen jobs passed. So it doesn't happen very often in that case - at least not during the time frame the test runs have been executed. All x390 tests failed in the same way but due to a different/unrelated issue. So the s390x runs didn't provide any new information.

Maybe it isn't the best approach to catch random network problems (which might be the cause for this). I guess I'll just increase the timeout for now.

#44 Updated by mkittler 2 months ago

I suppose finding out where in the test distribution the timeout needed to be specified isn't worth it considering that it would be nice if os-autoinst would provide a better default anyways. So I created a PR to increase the VNC timeout for remote VNC connections: https://github.com/os-autoinst/os-autoinst/pull/1992

#45 Updated by okurz 2 months ago

  • Has duplicate deleted (action #99345: [tools][qem] Incomplete test runs on s390x with auto_review:"backend died: Error connecting to VNC server.*s390.*Connection timed out":retry size:M)

#46 Updated by okurz 2 months ago

  • Related to action #99345: [tools][qem] Incomplete test runs on s390x with auto_review:"backend died: Error connecting to VNC server.*s390.*Connection timed out":retry size:M added

#47 Updated by mkittler about 2 months ago

The PR has been merged since 16.03.22 11:12.

It didn't really help that much:

openqa=> select id, t_finished, test, (select value from job_settings where job_id = jobs.id and key = 'BACKEND') as backend, arch, reason from jobs where reason like '%Error connecting to VNC server%<10%refused%' and t_finished >= '2022-03-16 11:12:00' order by id desc limit 40;
   id    |     t_finished      |       test       | backend | arch  |                                                      reason                                                      
---------+---------------------+------------------+---------+-------+------------------------------------------------------------------------------------------------------------------
 8360191 | 2022-03-21 01:37:27 | qam-minimal+base | svirt   | s390x | backend died: Error connecting to VNC server <10.161.145.94:5901>: IO::Socket::INET: connect: Connection refused
(1 Zeile)

openqa=> select id, t_finished, test, (select value from job_settings where job_id = jobs.id and key = 'BACKEND') as backend, arch, reason from jobs where reason like '%Error connecting to VNC server%<op%refused%' and t_finished >= '2022-03-16 11:12:00' order by id desc limit 40;
   id    |     t_finished      |                                 test                                  | backend |  arch  |                                                           reason                                                           
---------+---------------------+-----------------------------------------------------------------------+---------+--------+----------------------------------------------------------------------------------------------------------------------------
 8345350 | 2022-03-18 06:20:43 | ext4_yast                                                             | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8341382 | 2022-03-17 16:55:24 | extra_tests_gnome_sdk@Avinesh/os-autoinst-distri-opensuse#libqt5_test | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
 8339330 | 2022-03-17 16:53:02 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8338416 | 2022-03-18 00:53:49 | jeos-extratest:investigate:last_good_build:20220316-1                 | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
 8338414 | 2022-03-18 00:54:16 | jeos-extratest:investigate:retry                                      | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8336822 | 2022-03-17 07:56:06 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
 8336821 | 2022-03-17 07:55:51 | jeos-filesystem                                                       | svirt   | x86_64 | backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
(7 Zeilen)

#48 Updated by mkittler about 2 months ago

Judging by e.g. https://openqa.suse.de/tests/8336822

[2022-03-17T08:54:46.194965+01:00] [debug] considering VNC stalled, no update for 4.05 seconds
[2022-03-17T08:54:48.198468+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
(omitted 54 similar lines)
[2022-03-17T08:55:45.289048+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
[2022-03-17T08:55:46.294484+01:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused
[2022-03-17T08:55:46.295165+01:00] [debug] Closing SSH serial connection with openqaw5-xen.qa.suse.de

it really tried to re-connect for a minute. It looks similar for s390x, e.g. https://openqa.suse.de/tests/8360191

[2022-03-21T02:36:15.534600+01:00] [debug] no match: 58.9s, best candidate: addon-products-20190923 (0.00)
%@[2022-03-21T02:36:16.241419+01:00] [debug] no change: 57.9s
[2022-03-21T02:36:16.541229+01:00] [debug] no match: 57.9s, best candidate: addon-products-20190923 (0.00)

Main Menu

0) <-- Back <--
1) Start Installation          
2) Settings               
3) Expert                
4) Exit or Reboot            

> [2022-03-21T02:36:17.242377+01:00] [debug] no change: 56.9s
[2022-03-21T02:36:17.544547+01:00] [debug] no match: 56.9s, best candidate: addon-products-20190923 (0.00)
[2022-03-21T02:36:18.213362+01:00] [debug] considering VNC stalled, no update for 4.00 seconds
[2022-03-21T02:36:20.216730+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.94:5901>: IO::Socket::INET: connect: Connection refused
(…)
[2022-03-21T02:37:17.290986+01:00] [warn] !!! consoles::VNC::login: Error connecting to VNC server <10.161.145.94:5901>: IO::Socket::INET: connect: Connection refused
[2022-03-21T02:37:18.294195+01:00] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  Error connecting to VNC server <10.161.145.94:5901>: IO::Socket::INET: connect: Connection refused
[2022-03-21T02:37:18.294914+01:00] [debug] Closing SSH serial connection with s390zp19.suse.de
[2022-03-21T02:37:18.295244+01:00] [debug] Destroying openQA-SUT-5 virtual machine

Likely the VNC server is just not reachable anymore at this point (and therefore extending the timeout further is pointless).

I'd say that maybe the test it just doing something that turns down the VM. However, the problem happens in different test modules (e.g. lvm or snapper_create) and most of the times it also just works.

#49 Updated by okurz about 2 months ago

I have the feeling this might be related to #107062 as well

#50 Updated by coolgw about 1 month ago

Another kind of error also found in recent run:
https://openqa.nue.suse.com/tests/8459244/logfile?filename=autoinst-log.txt
[2022-04-02T14:18:07.546982+02:00] [warn] !!! consoles::VNC::catch {...} : Error in VNC protocol - relogin: unexpected end of data

#51 Updated by okurz about 1 month ago

  • Parent task set to #109656

#52 Updated by openqa_review 25 days ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_gnome
https://openqa.suse.de/tests/8573572#step/boot_to_desktop/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

#53 Updated by okurz 15 days ago

  • Due date set to 2022-05-06
  • Priority changed from Normal to High

Needs to be discussed urgently as it blocks #99345 which showed up in https://progress.opensuse.org/issues?query_id=544

#54 Updated by cdywan 15 days ago

Discussed this a bit:

  • We should add a retry for this whilst we can't track down the cause (and reduce prio)
  • Maybe #107026 is worth looking into first

#55 Updated by mkittler 15 days ago

  • Subject changed from [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused" to [tools] Test using svirt backend fails with auto_review:"Error connecting to VNC server.*: IO::Socket::INET: connect: Connection refused":retry
  • Description updated (diff)
  • Priority changed from High to Normal

#56 Updated by cdywan 15 days ago

  • Related to action #107026: Improve existing unit tests for VNC module to increase its test coverage (before doing any actual changes) size:M added

#57 Updated by okurz 15 days ago

  • Due date deleted (2022-05-06)
  • Priority changed from Normal to High

#58 Updated by okurz 15 days ago

  • Status changed from Feedback to Blocked
  • Priority changed from High to Normal

Let's consider this blocked by #107026 then. I assume that unit testing is unlikely to fix this problem directly but will help us to rule out some unlikely hypotheses as well as help when we want to introduce changes or fixes later on.

#59 Updated by okurz 6 days ago

  • Status changed from Blocked to New
  • Assignee deleted (mkittler)

#60 Updated by okurz 2 days ago

  • Target version changed from Ready to future

Also available in: Atom PDF