action #97517
closed[virtualization][3rd party hypervisor][vmware] Executing command returns 'undef' value with assert_script_run after vm reboot
0%
Description
Observation¶
The command was run successfully(Referring the screenshot in steps), but it was failed and timed out with a 'undef' return value.
The openqa job was run with PR#13036(https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13036).
Failures shows in test job:
http://openqa.nue.suse.com/tests/6939481#step/esxi_open_vm_tools/59
Steps to reproduce¶
- Shutdown VM and then power on the VM
- select_console 'sut'
- Run ping command with assert_script_run in VM (Screenshot - http://openqa.nue.suse.com/tests/6939481#step/esxi_open_vm_tools/57)
- Command execution failed with timeout error
Problem¶
H1. The "undef" return value from script_run cause timeout failure
Suggestion¶
Workaround¶
No
Files
Updated by okurz over 3 years ago
- Project changed from openQA Infrastructure (public) to openQA Project (public)
- Due date set to 2021-09-09
- Category set to Regressions/Crashes
- Status changed from New to Feedback
- Assignee set to okurz
- Target version set to Ready
The "undef" return value means that the command script_run
timed out so it returned neither "0" for success nor a non-zero exit code for failure. But of course I agree that the call should not timeout as the ping itself looks completely fine. https://openqa.nue.suse.com/tests/6939481#step/esxi_open_vm_tools/57 shows that after the ping command the token "LHroM" should have been written to /dev/ttyS0 but https://openqa.nue.suse.com/tests/6939481/logfile?filename=serial0.txt does not show that token. So the token was never received and hence the command times out.
As this is running on "vmware" I assume it is an issue specific to that backend.
It would help us if you could fill the ticket according to the ticket template for defects https://progress.opensuse.org/projects/openqav3/wiki/#Defects . Could you please try that and especially fill the information about reproducibility.
Updated by nanzhang over 3 years ago
Thank you Oliver for looking into this issue. I've updated the description as per the ticket template for defects.
Another finding is that the issue only happened on shutdown VM completely then booting up. VM reboot operation will not cause this issue, and it was verified in my local openqa. After rebooting VM, the token "LHroM" can be normally received from ttyS0 - http://10.67.129.66/tests/240#step/esxi_open_vm_tools/53
Updated by nanzhang over 3 years ago
Got the same issue on vmware esxi 6.7 - http://10.67.129.66/tests/243#step/esxi_open_vm_tools/114
Updated by okurz over 3 years ago
- Subject changed from [virtualization][3rd party hypervisor] Executing command returns 'undef' value with assert_script_run after vm reboot to [virtualization][3rd party hypervisor][vmware] Executing command returns 'undef' value with assert_script_run after vm reboot
- Due date deleted (
2021-09-09) - Assignee deleted (
okurz) - Priority changed from Normal to Low
- Target version changed from Ready to future
I see. So the impact seems to be limited to the vmware backend. Good that you could reproduce it on another vmware host. Within the QE Tools team I doubt we won't be able to help more. You are on your own, sorry.
Updated by nanzhang over 3 years ago
Actually, we have two cases which needs to shutdown VM during test running, and currently this issue blocked automation tests.
Updated by nanzhang over 3 years ago
- Status changed from New to Resolved
I've found a solution. After shutting down and booting up VM, it is required to re-setup the serial channel to the VM before switch to SUT console.
Just adding the following line in my test code.
console('svirt')->start_serial_grab;
Updated by nanzhang over 3 years ago
Verification run: http://10.67.129.66/tests/313