Project

General

Profile

action #97517

[virtualization][3rd party hypervisor][vmware] Executing command returns 'undef' value with assert_script_run after vm reboot

Added by nanzhang 10 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Low
Assignee:
-
Category:
Concrete Bugs
Target version:
Start date:
2021-08-24
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

The command was run successfully(Referring the screenshot in steps), but it was failed and timed out with a 'undef' return value.
The openqa job was run with PR#13036(https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13036).

Failures shows in test job:
http://openqa.nue.suse.com/tests/6939481#step/esxi_open_vm_tools/59

Steps to reproduce

Problem

H1. The "undef" return value from script_run cause timeout failure

Suggestion

Workaround

No

assert_script_run.png (33.1 KB) assert_script_run.png nanzhang, 2021-08-25 14:36
11834

History

#1 Updated by okurz 10 months ago

  • Project changed from openQA Infrastructure to openQA Project
  • Due date set to 2021-09-09
  • Category set to Concrete Bugs
  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Ready

The "undef" return value means that the command script_run timed out so it returned neither "0" for success nor a non-zero exit code for failure. But of course I agree that the call should not timeout as the ping itself looks completely fine. https://openqa.nue.suse.com/tests/6939481#step/esxi_open_vm_tools/57 shows that after the ping command the token "LHroM" should have been written to /dev/ttyS0 but https://openqa.nue.suse.com/tests/6939481/logfile?filename=serial0.txt does not show that token. So the token was never received and hence the command times out.

As this is running on "vmware" I assume it is an issue specific to that backend.

It would help us if you could fill the ticket according to the ticket template for defects https://progress.opensuse.org/projects/openqav3/wiki/#Defects . Could you please try that and especially fill the information about reproducibility.

#2 Updated by nanzhang 10 months ago

  • Description updated (diff)

#3 Updated by nanzhang 10 months ago

Thank you Oliver for looking into this issue. I've updated the description as per the ticket template for defects.

Another finding is that the issue only happened on shutdown VM completely then booting up. VM reboot operation will not cause this issue, and it was verified in my local openqa. After rebooting VM, the token "LHroM" can be normally received from ttyS0 - http://10.67.129.66/tests/240#step/esxi_open_vm_tools/53

#4 Updated by nanzhang 10 months ago

  • Status changed from Feedback to New

#5 Updated by nanzhang 10 months ago

Got the same issue on vmware esxi 6.7 - http://10.67.129.66/tests/243#step/esxi_open_vm_tools/114

#6 Updated by okurz 10 months ago

  • Subject changed from [virtualization][3rd party hypervisor] Executing command returns 'undef' value with assert_script_run after vm reboot to [virtualization][3rd party hypervisor][vmware] Executing command returns 'undef' value with assert_script_run after vm reboot
  • Due date deleted (2021-09-09)
  • Assignee deleted (okurz)
  • Priority changed from Normal to Low
  • Target version changed from Ready to future

I see. So the impact seems to be limited to the vmware backend. Good that you could reproduce it on another vmware host. Within the QE Tools team I doubt we won't be able to help more. You are on your own, sorry.

#7 Updated by nanzhang 10 months ago

Actually, we have two cases which needs to shutdown VM during test running, and currently this issue blocked automation tests.

#8 Updated by nanzhang 10 months ago

  • Status changed from New to Resolved

I've found a solution. After shutting down and booting up VM, it is required to re-setup the serial channel to the VM before switch to SUT console.

Just adding the following line in my test code.
console('svirt')->start_serial_grab;

#9 Updated by nanzhang 10 months ago

Also available in: Atom PDF