action #165923
open[qa-tools][vmware][spikesolution][timeboxed:20h] VNC reconnect after reboot size:S
0%
Description
Observation¶
openQA test in scenario sle-micro-6.1-Base-VMware-x86_64-cloud-init@svirt-vmware70 fails in
suseconnect_scc
VNC has been shown to be unstable in VMWare. Especially when a reboot occurs the SUT usually waits for the grub2 screen and tests should confirm the default option in grub2.
Most of the failures that are currently occurring in VMWare are caused by a missed 'ret' key in grub2 or the VNC connection getting stalled.
This was not really a problem with SLES, because reboots are not occurring that often as it is in SLEM tests.
[2024-08-28T11:34:27.435259+02:00] [debug] [pid:40411] tests/console/suseconnect_scc.pm:44 called transactional::process_reboot -> products/sle-micro/../../lib/transactional.pm:118 called testapi::wait_screen_change
[2024-08-28T11:34:27.435393+02:00] [debug] [pid:40411] <<< testapi::wait_screen_change(timeout=10, similarity_level=50)
[2024-08-28T11:34:27.437833+02:00] [debug] [pid:40411] tests/console/suseconnect_scc.pm:44 called transactional::process_reboot -> products/sle-micro/../../lib/transactional.pm:118 called testapi::wait_screen_change -> products/sle-micro/../../lib/transactional.pm:118 called testapi::send_key
[2024-08-28T11:34:27.437971+02:00] [debug] [pid:40411] <<< testapi::send_key(key="ret", wait_screen_change=0)
[2024-08-28T11:34:57.154911+02:00] [debug] [pid:40412] considering VNC stalled, no update for 30.03 seconds
[2024-08-28T11:34:57.157505+02:00] [debug] [pid:40412] Establishing VNC connection over WebSockets via https://esxi7.qa.suse.cz
[2024-08-28 11:34:58.24094] [42759] [info] Establishing WebSocket connection to wss://esxi7.qa.suse.cz:443/ticket/6c8025b2e3c3a460
[2024-08-28 11:34:58.24231] [42759] [info] Client accepted
[2024-08-28 11:34:58.26160] [42759] [info] WebSocket connection established
[2024-08-28T11:35:01.079317+02:00] [debug] [pid:40411] >>> testapi::wait_screen_change: screen change seen after 33.3535449504852 seconds (similarity: 21.3448611590361)
[2024-08-28T11:35:01.079747+02:00] [debug] [pid:40411] tests/console/suseconnect_scc.pm:44 called transactional::process_reboot -> products/sle-micro/../../lib/transactional.pm:120 called testapi::assert_screen
So the problem is that the VNC connection gets lost but our current stall-detection is not good enough because it does not re-send any possibly lost keys/clicks.
- https://openqa.suse.de/tests/15282809#step/suseconnect_scc/22
- https://openqa.suse.de/tests/15282550#step/suseconnect_scc/12
- https://openqa.suse.de/tests/15282568#step/host_config/11
- https://openqa.suse.de/tests/15099017#step/trup_smoke/41
- https://openqa.suse.de/tests/15282806#step/transactional_update/25
Reproducible¶
Fails since (at least) Build 3.12
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
Acceptance criteria¶
- AC1: We know how the problematic situation can be fixed or worked around.
Suggestions¶
- Maybe the situation can be detected
- Re-establish the VNC connection from test code or from testapi (taking care dewebsockify is also restarted as needed)
Updated by mloviska 4 months ago
- Related to action #164922: [vmware] add combustion & ignition support added
Updated by mdati 4 months ago
Probably similar or same issue dealed in https://progress.opensuse.org/issues/165890
Updated by mdati 4 months ago
- Related to action #162941: Add job group definitions for SLEM 6.0 to QAC-yaml added
Updated by mkittler 4 months ago
- Related to action #165890: [qe-core][qe-virt]test fails in host_config - Cannot login the system with enter 'ret' key added
Updated by okurz 3 months ago
- Category set to Feature requests
- Target version changed from Ready to future
I am kindly asking anybody outside the tools team to look into this especially when you have more experience with VMWare based openQA tests. We have really good code coverage of tests within os-autoinst nowadays so it should be relatively easy to integrate code changes.
Updated by openqa_review about 2 months ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: cloud-init@svirt-vmware70
https://openqa.suse.de/tests/15846827#step/first_boot/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.