Project

General

Profile

Actions

action #124161

closed

[vmware][esxi] Frequent websocket connection establishing will cause sending key no response

Added by nanzhang about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2023-02-09
Due date:
% Done:

100%

Estimated time:

Description

Observation

Frequently establishing websocket connection will cause it hard to match the next needle, as the sending key action was lost. Refer to the following job links.

https://openqa.suse.de/tests/10443972/logfile?filename=autoinst-log.txt
(After sending key 'ret', the websocket connection was closed which cause the key was actually not received by installer.)

[2023-02-07T16:06:34.061841+01:00] [debug] [pid:11593] tests/migration/online_migration/online_migration_setup.pm:26 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:919 called opensusebasetest::wait_boot_past_bootloader -> lib/opensusebasetest.pm:798 called opensusebasetest::handle_displaymanager_login -> lib/opensusebasetest.pm:600 called x11utils::handle_login -> lib/x11utils.pm:287 called x11utils::select_user_gnome -> lib/x11utils.pm:361 called testapi::send_key
[2023-02-07T16:06:34.062001+01:00] [debug] [pid:11593] <<< testapi::send_key(key="ret", do_wait=0, wait_screen_change=0)
[2023-02-07T16:06:34.130287+01:00] [debug] [pid:11594] considering VNC stalled, no update for 5.42 seconds
[2023-02-07 16:06:34.13047] [18272] [info] Client closed connection

https://openqa.suse.de/tests/10448675/logfile?filename=autoinst-log.txt
(After sending key 'alt-n', the websocket connection was closed.)

[2023-02-08T04:14:50.502113+01:00] [debug] [pid:12018] tests/installation/scc_registration.pm:25 called registration::fill_in_registration_data -> lib/registration.pm:605 called registration::process_modules -> lib/registration.pm:1010 called registration::process_scc_register_addons -> lib/registration.pm:504 called testapi::wait_screen_change -> lib/registration.pm:504 called testapi::send_key
[2023-02-08T04:14:50.502296+01:00] [debug] [pid:12018] <<< testapi::send_key(key="alt-n", do_wait=0, wait_screen_change=0)
[2023-02-08T04:14:50.684616+01:00] [debug] [pid:12019] considering VNC stalled, no update for 1675826090.68 seconds
[2023-02-08 04:14:50.68489] [14326] [info] Client closed connection

http://10.67.129.96/tests/1848/logfile?filename=autoinst-log.txt
(Before sending key 'alt-y', the websocket connection was closed.)

[2023-02-08T17:35:48.410521+08:00] [warn] [pid:9161] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 4.30 seconds for 137 candidate needles - make your needles more specific
[2023-02-08T17:35:48.410756+08:00] [debug] [pid:9161] no match: 359.9s, best candidate: registration-refreshing-repository-text-20210317 (0.29)
[2023-02-08 17:36:03.42835] [9750] [info] Client closed connection
[2023-02-08 17:36:03.43039] [9750] [info] WebSocket closed with status 1000.
[2023-02-08T17:36:08.476712+08:00] [debug] [pid:9160] >>> testapi::_handle_found_needle: found registration-online-repos-15sp1-20181217, similarity 1.00 @ 258/336
[2023-02-08T17:36:08.478434+08:00] [debug] [pid:9160] tests/installation/scc_registration.pm:25 called registration::fill_in_registration_data -> lib/registration.pm:596 called registration::handle_scc_popups -> lib/registration.pm:663 called testapi::wait_screen_change
[2023-02-08T17:36:08.478679+08:00] [debug] [pid:9160] <<< testapi::wait_screen_change(timeout=10, similarity_level=50)
[2023-02-08T17:36:08.480838+08:00] [debug] [pid:9160] tests/installation/scc_registration.pm:25 called registration::fill_in_registration_data -> lib/registration.pm:596 called registration::handle_scc_popups -> lib/registration.pm:663 called testapi::wait_screen_change -> lib/registration.pm:663 called testapi::send_key
[2023-02-08T17:36:08.481091+08:00] [debug] [pid:9160] <<< testapi::send_key(key="alt-y", do_wait=0, wait_screen_change=0)
[2023-02-08T17:36:08.629838+08:00] [debug] [pid:9161] considering VNC stalled, no update for 20.22 seconds
[2023-02-08T17:36:08.631376+08:00] [debug] [pid:9161] Establishing VNC connection over WebSockets via https://10.67.131.2
[2023-02-08 17:36:09.69433] [9809] [info] Establishing WebSocket connection to wss://vh002.qa2.suse.asia:443/ticket/227709543f641d16
[2023-02-08 17:36:09.69545] [9809] [info] Client accepted
[2023-02-08 17:36:09.70746] [9809] [info] WebSocket connection established

Steps to reproduce

Trigger below test runs in OSD:
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP5&build=72.4&groupid=264
default_install_svirt
online_upgrade_sles15sp4_vmware
textmode_svirt

Impact

Vmware7.0 tests with svirt backend can't be passed.

Problem

In my local openqa env, some of job run passed. Not sure any difference in OSD env.
default_install_svirt - http://10.67.129.96/tests/1875
online_upgrade_sles15sp4_vmware - http://10.67.129.96/tests/1873

Suggestion

Increase the interval for establishing connections

Workaround

No


Related issues 1 (0 open1 closed)

Related to openQA Project - coordination #100688: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0Resolvedokurz2021-10-11

Actions
Actions #1

Updated by livdywan about 1 year ago

  • Assignee deleted (mkittler)

Since you didn't mention having talked to @mkittler I'll assume the assignee was accidental.

Actions #2

Updated by nanzhang about 1 year ago

  • Related to coordination #100688: [epic][virtualization][3rd party hypervisor] Add svirt backend compatibility for vmware 7.0 added
Actions #3

Updated by nanzhang about 1 year ago

Last good runs:
default_install_svirt - https://openqa.suse.de/tests/9693253
online_upgrade_sles15sp4_vmware - https://openqa.suse.de/tests/9693275
textmode_svirt - https://openqa.suse.de/tests/9693223

Actions #4

Updated by nanzhang about 1 year ago

https://openqa.suse.de/tests/10456437/logfile?filename=autoinst-log.txt
-- The websocket connection was closed before mouse set, so it actually did not take effects.

[2023-02-09T08:54:02.223653+01:00] [debug] [pid:32141] considering VNC stalled, no update for 1675929242.22 seconds
[2023-02-09 08:54:02.22386] [1926] [info] Client closed connection
[2023-02-09T08:54:02.224272+01:00] [debug] [pid:32141] Establishing VNC connection over WebSockets via https://esxi7.qa.suse.cz
[2023-02-09T08:54:03.182407+01:00] [debug] [pid:32140] tests/installation/welcome.pm:207 called testapi::assert_and_click
[2023-02-09T08:54:03.182684+01:00] [debug] [pid:32140] <<< testapi::mouse_set(x=1023, y=767)
[2023-02-09 08:54:03.31997] [1927] [info] Establishing WebSocket connection to wss://esxi7.qa.suse.cz:443/ticket/d3603ade2b11eff1
[2023-02-09 08:54:03.32148] [1927] [info] Client accepted
[2023-02-09 08:54:03.38180] [1927] [info] WebSocket connection established
Actions #5

Updated by okurz about 1 year ago

  • Tags set to vmware, esxi, stall, vnc, websocket
  • Subject changed from Frequent websocket connection establishing will cause sending key no response to [vmware][esxi] Frequent websocket connection establishing will cause sending key no response
  • Category set to Regressions/Crashes
  • Target version set to future

I assume this only happens on the vmware backend? Please restructure the ticket description following https://progress.opensuse.org/projects/openqav3/wiki/#Defects. This would help understanding and tracking of the issue.

Actions #6

Updated by xlai about 1 year ago

@okurz Hi Oliver, this issue is blocking all VT tests on vmware 7, see details in https://openqa.suse.de/tests/overview?distri=sle&version=15-SP5&build=72.1&groupid=264.

Vmware 7 is the only vmware product that we test. It is mandatory test for VT. Besides, for any other QA team that has test for vmware, the 7.0 version is going to be only one too (qac team is planning to migrate from 6.5 to 7.0 recently).

IMHO, this ticket worths higher priority. I also evaluated whether VT team is able to fix it. I am afraid this time it is close with openqa essence (connection/VNC), better to have openqa tool experts' help. Is it possible that tools team can help to fix it sooner?

@jstehlik FYI.

Actions #7

Updated by okurz about 1 year ago

  • Target version changed from future to Ready

xlai wrote:

@okurz Hi Oliver, this issue is blocking all VT tests on vmware 7, see details in https://openqa.suse.de/tests/overview?distri=sle&version=15-SP5&build=72.1&groupid=264.

Vmware 7 is the only vmware product that we test. It is mandatory test for VT. Besides, for any other QA team that has test for vmware, the 7.0 version is going to be only one too (qac team is planning to migrate from 6.5 to 7.0 recently).

IMHO, this ticket worths higher priority. I also evaluated whether VT team is able to fix it. I am afraid this time it is close with openqa essence (connection/VNC), better to have openqa tool experts' help. Is it possible that tools team can help to fix it sooner?

Yes, sure. Please follow my suggestion: Please restructure the ticket description following https://progress.opensuse.org/projects/openqav3/wiki/#Defects. This would help understanding and tracking of the issue.
Otherwise I doubt that the team would have an issue time helping here. I was expecting that this was clear and that such details could have been added before we move it back to "Ready". Ok, adding to "Ready". If anyone picks up the ticket then making the ticket better understandable is part of the job then :)

Actions #8

Updated by xlai about 1 year ago

okurz wrote:

xlai wrote:

@okurz Hi Oliver, this issue is blocking all VT tests on vmware 7, see details in https://openqa.suse.de/tests/overview?distri=sle&version=15-SP5&build=72.1&groupid=264.

Vmware 7 is the only vmware product that we test. It is mandatory test for VT. Besides, for any other QA team that has test for vmware, the 7.0 version is going to be only one too (qac team is planning to migrate from 6.5 to 7.0 recently).

IMHO, this ticket worths higher priority. I also evaluated whether VT team is able to fix it. I am afraid this time it is close with openqa essence (connection/VNC), better to have openqa tool experts' help. Is it possible that tools team can help to fix it sooner?

Yes, sure. Please follow my suggestion: Please restructure the ticket description following https://progress.opensuse.org/projects/openqav3/wiki/#Defects. This would help understanding and tracking of the issue.
Otherwise I doubt that the team would have an issue time helping here. I was expecting that this was clear and that such details could have been added before we move it back to "Ready". Ok, adding to "Ready". If anyone picks up the ticket then making the ticket better understandable is part of the job then :)

Thanks a lot, Oliver.

@nanzhang Hi Nan, I hope this msg reaches you. Would you please reconstruct the description by following the above guide link? It will accelerate fix for your vmware tests' blocker ;-).

Actions #9

Updated by nanzhang about 1 year ago

  • Description updated (diff)
  • Target version changed from Ready to future
Actions #10

Updated by nanzhang about 1 year ago

xlai wrote:

@nanzhang Hi Nan, I hope this msg reaches you. Would you please reconstruct the description by following the above guide link? It will accelerate fix for your vmware tests' blocker ;-).

Already updated the description as per the defect template. Thanks

Actions #11

Updated by livdywan about 1 year ago

  • Target version changed from future to Ready

I assume the target change was accidental

Actions #12

Updated by okurz about 1 year ago

  • Target version changed from Ready to future

Sorry, we can't help with that right now.

Actions #13

Updated by openqa_review about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: default_install_svirt@svirt-vmware70
https://openqa.suse.de/tests/10737639#step/welcome/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 52 days if nothing changes in this ticket.

Actions #14

Updated by nanzhang 11 months ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

This issue is already resolved by ticket https://progress.opensuse.org/issues/128177, so mark it as resolved.

Actions

Also available in: Atom PDF