action #114820
closedError connecting to VNC over WebSockets server provided by VMWare
Description
Observed vnc connection error at the end of the test 'online_upgrade_sles15sp3_vmware'.
http://10.67.129.66/tests/1368
qa2-dhcp-236 login: [37m[2022-07-28T15:42:53.499053+08:00] [debug] considering VNC stalled, no update for 7.65 seconds[0m
[37m[2022-07-28T15:42:53.503015+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2[0m
[32m[2022-07-28T15:42:53.527221+08:00] [debug] >>> testapi::_handle_found_needle: found displaymanager-20210812, similarity 1.00 @ 312/312[0m
[31m[2022-07-28T15:42:55.669842+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused[0m
[31m[2022-07-28T15:42:56.739028+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused[0m
http://10.67.129.66/tests/1372
[37m[2022-07-29T13:53:14.086680+08:00] [debug] Establishing VNC connection to 10.67.131.2:5901[0m
[37m[2022-07-29T13:53:14.094162+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2[0m
[2022-07-29 13:53:14.20162] [12112] [info] Rejecting new client; already one client connected
[2022-07-29 13:53:14.20229] [12112] [info] Client closed connection
[2022-07-29 13:53:14.20418] [12112] [info] WebSocket closed with status 1000.
[33m[2022-07-29T13:53:14.349552+08:00] [info] ::: basetest::runtest: # Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 187.
Updated by nanzhang about 2 years ago
- Related to coordination #100688: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 added
Updated by mkittler about 2 years ago
About the 1st test:
The logs when it establishes the VNC connection over WebSockets look like this:
[2022-07-28T15:41:27.755085+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:346 called testapi::select_console
[2022-07-28T15:41:27.755297+08:00] [debug] <<< testapi::select_console(testapi_console="sut")
[2022-07-28T15:41:27.757702+08:00] [debug] Establishing VNC connection to 10.67.131.2:5901
[2022-07-28 15:41:27.76429] [12482] [info] Client closed connection
[2022-07-28T15:41:27.766355+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-28 15:41:27.76790] [12482] [info] WebSocket closed with status 1000.
[2022-07-28 15:41:28.85866] [13732] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/3e87a804707f97bf
[2022-07-28 15:41:28.85981] [13732] [info] Client accepted
[2022-07-28 15:41:28.87816] [13732] [info] WebSocket connection established
[2022-07-28T15:41:28.948694+08:00] [debug] activate_console, console: sut, type:
[2022-07-28T15:41:28.948883+08:00] [debug] activate_console called with generic type, no action
So there was a connection before and it has re-established a new one.
Apparently our backend's VNC client closed the TCP connection to dewebsockify
(Client closed connection
) which then closed the WebSockets connection normally (WebSocket closed with status 1000
):
[2022-07-28T15:42:14.749203+08:00] [debug] no match: 556.8s, best candidate: displaymanager-2-20200311 (0.00)
[2022-07-28 15:42:41.27382] [13732] [info] Client closed connection
[2022-07-28 15:42:41.27579] [13732] [info] WebSocket closed with status 1000.
[2022-07-28T15:42:45.826+08:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 30.77 seconds for 39 candidate needles - make your needles more specific
[2022-07-28T15:42:45.826292+08:00] [debug] no match: 555.8s, best candidate: displaymanager-2-20200311 (0.00)
[2022-07-28T15:42:45.848333+08:00] [warn] !!! backend::baseclass::do_capture: There is some problem with your environment, we detected a stall for 31.0974600315094 seconds
Likely this is due to the detected stall. Then the backend tries to re-establish the WebSocket connection (Establishing VNC connection over WebSockets…
) but it somehow doesn't work as the VNC client cannot connect to dewebsockify
(Connection refused
). There are no log messages from dewebsockify
but also no errors about launching it.
Welcome to SUSE Linux Enterprise Server 15 SP4 (x86_64) - Kernel 5.14.21-150400.22-default (ttyS0).
eth0: 10.67.129.236 fe80::20c:29ff:fee2:b526
qa2-dhcp-236 login: [2022-07-28T15:42:53.499053+08:00] [debug] considering VNC stalled, no update for 7.65 seconds
[2022-07-28T15:42:53.503015+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-28T15:42:53.527221+08:00] [debug] >>> testapi::_handle_found_needle: found displaymanager-20210812, similarity 1.00 @ 312/312
[2022-07-28T15:42:55.669842+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:42:56.739028+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:42:57.821556+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:42:58.947867+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:43:00.239502+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
[2022-07-28T15:43:01.887419+08:00] [warn] !!! consoles::VNC::login: Error connecting to VNC over WebSockets server provided by VMWare <127.0.0.1:5901>: IO::Socket::INET: connect: Connection refused
Unexpected end of data 0
[2022-07-28T15:43:07.363730+08:00] [debug] backend process exited: 0
About the 2nd test:
It generally re-establishes the WebSocket connection when re-connecting on a VNC stall:
x11utils::handle_login -> lib/x11utils.pm:277 called testapi::wait_still_screen
[2022-07-29T13:32:00.687192+08:00] [debug] <<< testapi::wait_still_screen(timeout=30, stilltime=7, similarity_level=47)
[2022-07-29T13:32:05.477899+08:00] [debug] considering VNC stalled, no update for 4.13 seconds
[2022-07-29T13:32:05.478713+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:06.55828] [12104] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/e694bba6922ac435
[2022-07-29 13:32:06.56013] [12104] [info] Client accepted
[2022-07-29 13:32:06.57209] [12104] [info] WebSocket connection established
[2022-07-29T13:32:10.858549+08:00] [debug] considering VNC stalled, no update for 4.01 seconds
[2022-07-29 13:32:10.85895] [12104] [info] Client closed connection
[2022-07-29T13:32:10.859444+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:11.93882] [12105] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/a3b949e095236e87
[2022-07-29 13:32:11.93993] [12105] [info] Client accepted
[2022-07-29 13:32:11.95380] [12105] [info] WebSocket connection established
[2022-07-29T13:32:15.010037+08:00] [debug] >>> testapi::wait_still_screen: detected same image for 7 seconds, last detected similarity is 1000000
[2022-07-29T13:32:15.010425+08:00] [debug] tests/migration/online_migration/zypper_patch.pm:37 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:1166 called opensusebasetest::wait_boot_past_bootloader -> lib/opensusebasetest.pm:1047 called opensusebasetest::handle_displaymanager_login -> lib/opensusebasetest.pm:882 called x11utils::handle_login -> lib/x11utils.pm:278 called testapi::check_screen
[2022-07-29T13:32:15.010717+08:00] [debug] <<< testapi::check_screen(mustmatch=[
"authentication-required-user-settings",
"authentication-required-modify-system"
], timeout=15)
[2022-07-29T13:32:15.266956+08:00] [debug] no match: 14.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:16.014420+08:00] [debug] no change: 13.9s
[2022-07-29T13:32:16.144138+08:00] [debug] no match: 13.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:16.239328+08:00] [debug] considering VNC stalled, no update for 4.01 seconds
[2022-07-29 13:32:16.23971] [12105] [info] Client closed connection
[2022-07-29T13:32:16.240015+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:17.31037] [12107] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/a792a421c9a7067d
[2022-07-29 13:32:17.31156] [12107] [info] Client accepted
[2022-07-29 13:32:17.32297] [12107] [info] WebSocket connection established
[2022-07-29T13:32:17.347327+08:00] [debug] no change: 12.6s
[2022-07-29T13:32:17.472721+08:00] [debug] no match: 12.6s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:18.016319+08:00] [debug] no change: 11.9s
[2022-07-29T13:32:18.140001+08:00] [debug] no match: 11.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:19.438820+08:00] [debug] no match: 10.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:20.018165+08:00] [debug] no change: 9.9s
[2022-07-29T13:32:20.142951+08:00] [debug] no match: 9.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:21.019346+08:00] [debug] no change: 8.9s
[2022-07-29T13:32:21.142609+08:00] [debug] no match: 8.9s, best candidate: authentication-required-20210823 (0.00)
[2022-07-29T13:32:21.746633+08:00] [debug] considering VNC stalled, no update for 4.02 seconds
[2022-07-29 13:32:21.74699] [12107] [info] Client closed connection
[2022-07-29T13:32:21.747410+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:32:22.81808] [12108] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/bd8c34bdba437eb4
[2022-07-29 13:32:22.81939] [12108] [info] Client accepted
[2022-07-29 13:32:22.83111] [12108] [info] WebSocket connection established
The problem happens shortly after the VM is rebooted:
[2022-07-29T13:52:58.693752+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:337 called testapi::select_console -> lib/susedistribution.pm:847 called utils::save_svirt_pty -> lib/utils.pm:133 called testapi::enter_cmd
[2022-07-29T13:52:58.694145+08:00] [debug] <<< testapi::type_string(string="pty=`virsh dumpxml openQA-SUT-1 2>/dev/null | grep \"console type=\" | sed \"s/'/ /g\" | awk '{ print \$5 }'`\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[ 1316.059755] reboot: Restarting system
[2022-07-29T13:53:02.573818+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:337 called testapi::select_console -> lib/susedistribution.pm:847 called utils::save_svirt_pty -> lib/utils.pm:134 called testapi::enter_cmd
[2022-07-29T13:53:02.574237+08:00] [debug] <<< testapi::type_string(string="echo \$pty\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-07-29T13:53:02.949872+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:345 called testapi::wait_serial
[2022-07-29T13:53:02.950324+08:00] [debug] <<< testapi::wait_serial(no_regex=0, timeout=180, record_output=undef, expect_not_found=0, quiet=undef, buffer_size=undef, regexp="GNU GRUB")
GNU GRUB version 2.06
+----------------------------------------------------------------------------+||||||||||||||||||||||||+----------------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted.
Press enter to boot the selected OS, `e' to edit the commands
before booting or `c' for a command-line. *SLES 15-SP4 Advanced options for SLES 15-SP4 Start bootloader from a read-only snapshot [2022-07-29T13:53:14.084455+08:00] [debug] >>> testapi::wait_serial: GNU GRUB: ok
[2022-07-29T13:53:14.084802+08:00] [debug] tests/migration/online_migration/zypper_migration.pm:162 called power_action_utils::power_action -> lib/power_action_utils.pm:346 called testapi::select_console
[2022-07-29T13:53:14.085032+08:00] [debug] <<< testapi::select_console(testapi_console="sut")
[2022-07-29T13:53:14.086680+08:00] [debug] Establishing VNC connection to 10.67.131.2:5901
[2022-07-29T13:53:14.094162+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-07-29 13:53:14.20162] [12112] [info] Rejecting new client; already one client connected
[2022-07-29 13:53:14.20229] [12112] [info] Client closed connection
[2022-07-29 13:53:14.20418] [12112] [info] WebSocket closed with status 1000.
[2022-07-29T13:53:14.349552+08:00] [info] ::: basetest::runtest: # Test died: unexpected end of data at /usr/lib/os-autoinst/consoles/VNC.pm line 187.
at /usr/lib/os-autoinst/testapi.pm line 1724.
testapi::select_console("sut") called at sle/lib/power_action_utils.pm line 346
power_action_utils::power_action("reboot", "textmode", 1) called at sle/tests/migration/online_migration/zypper_migration.pm line 162
zypper_migration::run(zypper_migration=HASH(0x560697eb1498)) called at /usr/lib/os-autoinst/basetest.pm line 328
eval {...} called at /usr/lib/os-autoinst/basetest.pm line 322
basetest::runtest(zypper_migration=HASH(0x560697eb1498)) called at /usr/lib/os-autoinst/autotest.pm line 364
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 364
autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 240
eval {...} called at /usr/lib/os-autoinst/autotest.pm line 240
autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 291
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x56069953e3f0)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x56069953e3f0), CODE(0x56069953aa48)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 477
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x56069953e3f0)) called at /usr/lib/os-autoinst/autotest.pm line 293
autotest::start_process() called at /usr/bin/isotovideo line 262
The log message Rejecting new client; already one client connected
indicates that the backend tries to create another VNC connection while one is still present (dewebsockify
is still running and thinks our VNC client is connected to it). I suppose that's the problem. Now the question to answer is why it happened here and how it can be fixed.
Updated by mkittler about 2 years ago
but it somehow doesn't work as the VNC client cannot connect to
dewebsockify
I suppose I know why that is. I'm working on a fix.
Updated by mkittler about 2 years ago
- Status changed from In Progress to Feedback
The cause is that dewebsockify
is not launched a second time due to a logic error. The second test failure has probably the same cause and just looks different because the previous dewebsockify
instance is actually still running.
So this PR should address both issues: https://github.com/os-autoinst/os-autoinst/pull/2136
I have only tested it via unit tests so it would be great if you could checkout my changes on your worker host and try them out.
Updated by okurz about 2 years ago
- Due date set to 2022-08-13
- Category set to Regressions/Crashes
- Target version set to Ready
Updated by nanzhang about 2 years ago
- Status changed from Feedback to In Progress
Verified this PR, and it got passed after applying the fix. Thanks
http://10.67.129.66/tests/1374#
Updated by mkittler about 2 years ago
- Status changed from In Progress to Resolved
Yes, it looks like it now survives multiple re-connects:
[2022-08-01T17:35:51.029054+08:00] [debug] <<< testapi::wait_still_screen(similarity_level=47, timeout=30, stilltime=7)
[2022-08-01T17:35:55.829717+08:00] [debug] considering VNC stalled, no update for 4.14 seconds
[2022-08-01 17:35:55.83001] [31166] [info] Client closed connection
[2022-08-01T17:35:55.831210+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-08-01 17:35:56.90925] [31177] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/2278fd27436587ad
[2022-08-01 17:35:56.91047] [31177] [info] Client accepted
[2022-08-01 17:35:56.92243] [31177] [info] WebSocket connection established
[2022-08-01T17:36:01.966720+08:00] [debug] considering VNC stalled, no update for 4.01 seconds
[2022-08-01 17:36:01.96711] [31177] [info] Client closed connection
[2022-08-01T17:36:01.967698+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-08-01 17:36:03.05783] [31183] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/c44df758433fd601
[2022-08-01 17:36:03.05893] [31183] [info] Client accepted
[2022-08-01 17:36:03.07145] [31183] [info] WebSocket connection established
…
[2022-08-01T17:36:20.458505+08:00] [debug] Establishing VNC connection over WebSockets via https://10.67.131.2
[2022-08-01 17:36:20.45902] [31198] [info] WebSocket closed with status 1000.
[2022-08-01 17:36:21.55094] [31204] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/bc3a23989a5f2624
[2022-08-01 17:36:21.55219] [31204] [info] Client accepted
[2022-08-01 17:36:21.56381] [31204] [info] WebSocket connection established
[2022-08-01T17:36:22.144118+08:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.54 seconds for 7 candidate needles - make your needles more specific
[2022-08-01T17:36:22.144408+08:00] [debug] no match: -1.5s, best candidate: authentication-required-20210823 (0.00)
[2022-08-01T17:36:22.465719+08:00] [debug] >>> testapi::_check_backend_response: match=authentication-required-modify-system,authentication-required-user-settings timed out after 15 (check_screen)
[2022-08-01T17:36:22.504146+08:00] [debug] tests/migration/online_migration/online_migration_setup.pm:26 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:1166 called opensusebasetest::wait_boot_past_bootloader -> lib/opensusebasetest.pm:1047 called opensusebasetest::handle_displaymanager_login -> lib/opensusebasetest.pm:882 called x11utils::handle_login -> lib/x11utils.pm:279 called testapi::assert_screen
[2022-08-01T17:36:22.504586+08:00] [debug] <<< testapi::assert_screen(mustmatch=[
"generic-desktop",
"gnome-activities",
"opensuse-welcome"
], timeout=180)
So I suppose the ticket can be considered resolved.