Project

General

Profile

Actions

action #114820

closed

Error connecting to VNC over WebSockets server provided by VMWare

Added by nanzhang over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-07-29
Due date:
2022-08-13
% Done:

0%

Estimated time:

Description

Observed vnc connection error at the end of the test 'online_upgrade_sles15sp3_vmware'.

http://10.67.129.66/tests/1368

qa2-dhcp-236 login: [2022-07-28T15:42:53.499053+08:00] [debug] considering VNC stalled, no update for 7.65 seconds
[2022-07-28T15:42:53.503015+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-28T15:42:53.527221+08:00] [debug] >>> testapi::_handle_found_needle: found displaymanager-20210812, similarity 1.00 @ 312/312
[2022-07-28T15:42:55.669842+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:42:56.739028+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused

http://10.67.129.66/tests/1372

[2022-07-29T13:53:14.086680+08:00] [debug] Establishing VNC connection to 10.67.131.2:5901
[2022-07-29T13:53:14.094162+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:53:14.20162] [12112] [info] Rejecting new client; already one client connected
[2022-07-29 13:53:14.20229] [12112] [info] Client closed connection
[2022-07-29 13:53:14.20418] [12112] [info] WebSocket closed with status 1000.
[2022-07-29T13:53:14.349552+08:00] [info] ::: basetest::runtest: # Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 187.

Related issues 1 (0 open1 closed)

Related to openQA Project - coordination #100688: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0Resolvedokurz2021-10-11

Actions
Actions #2

Updated by nanzhang over 1 year ago

  • Related to coordination #100688: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 added
Actions #3

Updated by mkittler over 1 year ago

  • Description updated (diff)
Actions #4

Updated by mkittler over 1 year ago

About the 1st test:

The logs when it establishes the VNC connection over WebSockets look like this:

[2022-07-28T15:41:27.755085+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:346 called testapi::select_console
[2022-07-28T15:41:27.755297+08:00] [debug] <<< testapi::select_console(testapi_console="sut")
[2022-07-28T15:41:27.757702+08:00] [debug] Establishing VNC connection to 10.67.131.2:5901
[2022-07-28 15:41:27.76429] [12482] [info] Client closed connection
[2022-07-28T15:41:27.766355+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-28 15:41:27.76790] [12482] [info] WebSocket closed with status 1000.
[2022-07-28 15:41:28.85866] [13732] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/3e87a804707f97bf
[2022-07-28 15:41:28.85981] [13732] [info] Client accepted
[2022-07-28 15:41:28.87816] [13732] [info] WebSocket connection established
[2022-07-28T15:41:28.948694+08:00] [debug] activate_console, console: sut, type: 
[2022-07-28T15:41:28.948883+08:00] [debug] activate_console called with generic type, no action

So there was a connection before and it has re-established a new one.

Apparently our backend's VNC client closed the TCP connection to dewebsockify (Client closed connection) which then closed the WebSockets connection normally (WebSocket closed with status 1000):

[2022-07-28T15:42:14.749203+08:00] [debug] no match: 556.8s, best candidate: displaymanager-2-20200311 (0.00)
[2022-07-28 15:42:41.27382] [13732] [info] Client closed connection
[2022-07-28 15:42:41.27579] [13732] [info] WebSocket closed with status 1000.
[2022-07-28T15:42:45.826+08:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 30.77 seconds for 39 candidate needles - make your needles more specific
[2022-07-28T15:42:45.826292+08:00] [debug] no match: 555.8s, best candidate: displaymanager-2-20200311 (0.00)
[2022-07-28T15:42:45.848333+08:00] [warn] !!! backend::baseclass::do_capture: There is some problem with your environment, we detected a stall for 31.0974600315094 seconds

Likely this is due to the detected stall. Then the backend tries to re-establish the WebSocket connection (Establishing VNC connection over WebSockets…) but it somehow doesn't work as the VNC client cannot connect to dewebsockify (Connection refused). There are no log messages from dewebsockify but also no errors about launching it.


Welcome to SUSE Linux Enterprise Server 15 SP4  (x86_64) - Kernel 5.14.21-150400.22-default (ttyS0).

eth0: 10.67.129.236 fe80::20c:29ff:fee2:b526


qa2-dhcp-236 login: [2022-07-28T15:42:53.499053+08:00] [debug] considering VNC stalled, no update for 7.65 seconds
[2022-07-28T15:42:53.503015+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-28T15:42:53.527221+08:00] [debug] >>> testapi::_handle_found_needle: found displaymanager-20210812, similarity 1.00 @ 312/312
[2022-07-28T15:42:55.669842+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:42:56.739028+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:42:57.821556+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:42:58.947867+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:43:00.239502+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:43:01.887419+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
Unexpected end of data 0
[2022-07-28T15:43:07.363730+08:00] [debug] backend process exited: 0

About the 2nd test:

It generally re-establishes the WebSocket connection when re-connecting on a VNC stall:

x11utils::handle_login -> lib/x11utils.pm:277 called testapi::wait_still_screen
[2022-07-29T13:32:00.687192+08:00] [debug] <<< testapi::wait_still_screen(timeout=30, stilltime=7, similarity_level=47)
[2022-07-29T13:32:05.477899+08:00] [debug] considering VNC stalled, no update for 4.13 seconds
[2022-07-29T13:32:05.478713+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:06.55828] [12104] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/e694bba6922ac435
[2022-07-29 13:32:06.56013] [12104] [info] Client accepted
[2022-07-29 13:32:06.57209] [12104] [info] WebSocket connection established
[2022-07-29T13:32:10.858549+08:00] [debug] considering VNC stalled, no update for 4.01 seconds
[2022-07-29 13:32:10.85895] [12104] [info] Client closed connection
[2022-07-29T13:32:10.859444+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:11.93882] [12105] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/a3b949e095236e87
[2022-07-29 13:32:11.93993] [12105] [info] Client accepted
[2022-07-29 13:32:11.95380] [12105] [info] WebSocket connection established
[2022-07-29T13:32:15.010037+08:00] [debug] >>> testapi::wait_still_screen: detected same image for 7 seconds, last detected similarity is 1000000
[2022-07-29T13:32:15.010425+08:00] [debug] tests/migration/online_migration/zypper_patch.pm:37 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:1166 called opensusebasetest::wait_boot_past_bootloader -> lib/opensusebasetest.pm:1047 called opensusebasetest::handle_displaymanager_login -> lib/opensusebasetest.pm:882 called x11utils::handle_login -> lib/x11utils.pm:278 called testapi::check_screen
[2022-07-29T13:32:15.010717+08:00] [debug] <<< testapi::check_screen(mustmatch=[
    "authentication-required-user-settings",
    "authentication-required-modify-system"
  ], timeout=15)
[2022-07-29T13:32:15.266956+08:00] [debug] no match: 14.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:16.014420+08:00] [debug] no change: 13.9s
[2022-07-29T13:32:16.144138+08:00] [debug] no match: 13.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:16.239328+08:00] [debug] considering VNC stalled, no update for 4.01 seconds
[2022-07-29 13:32:16.23971] [12105] [info] Client closed connection
[2022-07-29T13:32:16.240015+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:17.31037] [12107] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/a792a421c9a7067d
[2022-07-29 13:32:17.31156] [12107] [info] Client accepted
[2022-07-29 13:32:17.32297] [12107] [info] WebSocket connection established
[2022-07-29T13:32:17.347327+08:00] [debug] no change: 12.6s
[2022-07-29T13:32:17.472721+08:00] [debug] no match: 12.6s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:18.016319+08:00] [debug] no change: 11.9s
[2022-07-29T13:32:18.140001+08:00] [debug] no match: 11.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:19.438820+08:00] [debug] no match: 10.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:20.018165+08:00] [debug] no change: 9.9s
[2022-07-29T13:32:20.142951+08:00] [debug] no match: 9.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:21.019346+08:00] [debug] no change: 8.9s
[2022-07-29T13:32:21.142609+08:00] [debug] no match: 8.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:21.746633+08:00] [debug] considering VNC stalled, no update for 4.02 seconds
[2022-07-29 13:32:21.74699] [12107] [info] Client closed connection
[2022-07-29T13:32:21.747410+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:22.81808] [12108] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/bd8c34bdba437eb4
[2022-07-29 13:32:22.81939] [12108] [info] Client accepted
[2022-07-29 13:32:22.83111] [12108] [info] WebSocket connection established

The problem happens shortly after the VM is rebooted:

[2022-07-29T13:52:58.693752+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:337 called testapi::select_console -> lib/susedistribution.pm:847 called utils::save_svirt_pty -> lib/utils.pm:133 called testapi::enter_cmd
[2022-07-29T13:52:58.694145+08:00] [debug] <<< testapi::type_string(string="pty=`virsh dumpxml openQA-SUT-1 2>/dev/null | grep \"console type=\" | sed \"s/'/ /g\" | awk '{ print \$5 }'`\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[ 1316.059755] reboot: Restarting system
[2022-07-29T13:53:02.573818+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:337 called testapi::select_console -> lib/susedistribution.pm:847 called utils::save_svirt_pty -> lib/utils.pm:134 called testapi::enter_cmd
[2022-07-29T13:53:02.574237+08:00] [debug] <<< testapi::type_string(string="echo \$pty\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-07-29T13:53:02.949872+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:345 called testapi::wait_serial
[2022-07-29T13:53:02.950324+08:00] [debug] <<< testapi::wait_serial(no_regex=0, timeout=180, record_output=undef, expect_not_found=0, quiet=undef, buffer_size=undef, regexp="GNU GRUB")
GNU GRUB  version 2.06

+----------------------------------------------------------------------------+||||||||||||||||||||||||+----------------------------------------------------------------------------+     Use the ^ and v keys to select which entry is highlighted.          
      Press enter to boot the selected OS, `e' to edit the commands       
      before booting or `c' for a command-line.                            *SLES 15-SP4                                                                 Advanced options for SLES 15-SP4                                            Start bootloader from a read-only snapshot                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            [2022-07-29T13:53:14.084455+08:00] [debug] >>> testapi::wait_serial: GNU GRUB: ok
[2022-07-29T13:53:14.084802+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:346 called testapi::select_console
[2022-07-29T13:53:14.085032+08:00] [debug] <<< testapi::select_console(testapi_console="sut")
[2022-07-29T13:53:14.086680+08:00] [debug] Establishing VNC connection to 10.67.131.2:5901
[2022-07-29T13:53:14.094162+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:53:14.20162] [12112] [info] Rejecting new client; already one client connected
[2022-07-29 13:53:14.20229] [12112] [info] Client closed connection
[2022-07-29 13:53:14.20418] [12112] [info] WebSocket closed with status 1000.
[2022-07-29T13:53:14.349552+08:00] [info] ::: basetest::runtest: # Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 187.
   at /usr/lib/os-autoinst/testapi.pm line 1724.
    testapi::select_console("sut") called at sle/lib/power_action_utils.pm line 346
    power_action_utils::power_action("reboot", "textmode", 1) called at sle/tests/migration/online_migration/zypper_migration.pm line 162
    zypper_migration::run(zypper_migration=HASH(0x560697eb1498)) called at /usr/lib/os-autoinst/basetest.pm line 328
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 322
    basetest::runtest(zypper_migration=HASH(0x560697eb1498)) called at /usr/lib/os-autoinst/autotest.pm line 364
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 364
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 240
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 240
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 291
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x56069953e3f0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x56069953e3f0), CODE(0x56069953aa48)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x56069953e3f0)) called at /usr/lib/os-autoinst/autotest.pm line 293
    autotest::start_process() called at /usr/bin/isotovideo line 262

The log message Rejecting new client; already one client connected indicates that the backend tries to create another VNC connection while one is still present (dewebsockify is still running and thinks our VNC client is connected to it). I suppose that's the problem. Now the question to answer is why it happened here and how it can be fixed.

Actions #5

Updated by mkittler over 1 year ago

but it somehow doesn't work as the VNC client cannot connect to dewebsockify

I suppose I know why that is. I'm working on a fix.

Actions #6

Updated by mkittler over 1 year ago

  • Status changed from New to In Progress
Actions #7

Updated by mkittler over 1 year ago

  • Status changed from In Progress to Feedback

The cause is that dewebsockify is not launched a second time due to a logic error. The second test failure has probably the same cause and just looks different because the previous dewebsockify instance is actually still running.

So this PR should address both issues: https://github.com/os-autoinst/os-autoinst/pull/2136

I have only tested it via unit tests so it would be great if you could checkout my changes on your worker host and try them out.

Actions #8

Updated by okurz over 1 year ago

  • Due date set to 2022-08-13
  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #9

Updated by nanzhang over 1 year ago

  • Status changed from Feedback to In Progress

Verified this PR, and it got passed after applying the fix. Thanks
http://10.67.129.66/tests/1374#

Actions #10

Updated by mkittler over 1 year ago

  • Status changed from In Progress to Resolved

Yes, it looks like it now survives multiple re-connects:

[2022-08-01T17:35:51.029054+08:00] [debug] <<< testapi::wait_still_screen(similarity_level=47, timeout=30, stilltime=7)
[2022-08-01T17:35:55.829717+08:00] [debug] considering VNC stalled, no update for 4.14 seconds
[2022-08-01 17:35:55.83001] [31166] [info] Client closed connection
[2022-08-01T17:35:55.831210+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-08-01 17:35:56.90925] [31177] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/2278fd27436587ad
[2022-08-01 17:35:56.91047] [31177] [info] Client accepted
[2022-08-01 17:35:56.92243] [31177] [info] WebSocket connection established
[2022-08-01T17:36:01.966720+08:00] [debug] considering VNC stalled, no update for 4.01 seconds
[2022-08-01 17:36:01.96711] [31177] [info] Client closed connection
[2022-08-01T17:36:01.967698+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-08-01 17:36:03.05783] [31183] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/c44df758433fd601
[2022-08-01 17:36:03.05893] [31183] [info] Client accepted
[2022-08-01 17:36:03.07145] [31183] [info] WebSocket connection established
…
[2022-08-01T17:36:20.458505+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-08-01 17:36:20.45902] [31198] [info] WebSocket closed with status 1000.
[2022-08-01 17:36:21.55094] [31204] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/bc3a23989a5f2624
[2022-08-01 17:36:21.55219] [31204] [info] Client accepted
[2022-08-01 17:36:21.56381] [31204] [info] WebSocket connection established
[2022-08-01T17:36:22.144118+08:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.54 seconds for 7 candidate needles - make your needles more specific
[2022-08-01T17:36:22.144408+08:00] [debug] no match: -1.5s, best candidate: authentication-required-20210823 (0.00)
[2022-08-01T17:36:22.465719+08:00] [debug] >>> testapi::_check_backend_response: match=authentication-required-modify-system,authentication-required-user-settings timed out after 15 (check_screen)
[2022-08-01T17:36:22.504146+08:00] [debug] tests/migration/online_migration/online_migration_setup.pm:26 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:1166 called opensusebasetest::wait_boot_past_bootloader -> lib/opensusebasetest.pm:1047 called opensusebasetest::handle_displaymanager_login -> lib/opensusebasetest.pm:882 called x11utils::handle_login -> lib/x11utils.pm:279 called testapi::assert_screen
[2022-08-01T17:36:22.504586+08:00] [debug] <<< testapi::assert_screen(mustmatch=[
    "generic-desktop",
    "gnome-activities",
    "opensuse-welcome"
  ], timeout=180)

So I suppose the ticket can be considered resolved.

Actions

Also available in: Atom PDF