Project

General

Profile

Actions

action #128558

closed

[sporadic] test fails in shutdown

Added by syrianidou_sofia over 1 year ago. Updated 10 months ago.

Status:
Rejected
Priority:
Low
Assignee:
-
Target version:
-
Start date:
2023-05-03
Due date:
% Done:

0%

Estimated time:

Description

After typing the password, there is a blank screen that should be matching the needle here:
https://openqa.suse.de/tests/10931763#step/shutdown/20
but it is failing. We should investigate if the shutdown module has been modified (also the libraries that it calls upon) or some needles been removed.

Observation

openQA test in scenario sle-15-SP5-Online-x86_64-create_hdd_gnome_libyui@svirt-xen-hvm fails in
shutdown

Test suite description

Boot into gnome HDD, configure and republish image for tests that use libyui.

Reproducible

Fails since (at least) Build 61.1

Expected result

Last good: 93.2 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by syrianidou_sofia over 1 year ago

  • Project changed from openQA Tests (public) to qe-yam
  • Category deleted (Bugs in existing tests)
  • Target version set to Current
Actions #2

Updated by tbaev about 1 year ago

  • Assignee set to tbaev
Actions #3

Updated by tbaev about 1 year ago

  • Status changed from New to In Progress
Actions #4

Updated by tbaev about 1 year ago

There is a check if the SUT is shutdown

os-autoinst/backend/svirt.pm:        $rsp = $self->run_ssh_cmd("! virsh $libvirt_connector dominfo $vmname | grep -w 'shut off'");

This check can fail when the XEN hypervisor starts a new VM from a unrelated openQA test with the same name (ex. openQA-SUT-5) immediately after the one we are testing is shutdown.

A possible solutions

  1. Check it not by $vmname but by the xen ID from the virsh dominfo. The ID is different after the VM is started again.
  2. Check the VM uptime if it is less then ~30 seconds if it is not shutdown with xl uptime $vmname

I think 1. is better, currently trying to implement it.

Actions #5

Updated by tbaev about 1 year ago

My theory is that "shutdown" check for VMs can fail because we are checking if a VM with name "X" is shutdown. Before the check is done a different openQA test can start on the hypervisor and run a VM with the same name.

I am trying to get the VM by ID and check if it is shutdown by id.
But this logic is in os-autoinst.git and I am a little lost, and it seems that this change is better suited for tools-team.

Two files of interest are
os-autoinst/backend/svirt.pm sub is_shutdown
and
os-autoinst/consoles/sshVirtsh.pm #define the new domain

Actions #6

Updated by favogt about 1 year ago

My theory is that "shutdown" check for VMs can fail because we are checking if a VM with name "X" is shutdown. Before the check is done a different openQA test can start on the hypervisor and run a VM with the same name.

Can such a name reuse actually happen? The name should be machine and worker number specific.

Actions #7

Updated by tbaev about 1 year ago

I am not sure how VIRSH_INSTANCE is define, but when I login to a hypervisor it is usually a single digit number (openQA-SUT-*) and diffrent from the VM ID

unreal6:~ # virsh list
 Id   Name           State
------------------------------
 0    Domain-0       running
 3    openQA-SUT-8   running
 4    openQA-SUT-5   running

When openQA is busy with a lot of tests vm openQA-SUT-8 can be shut down, and a different test on the same hypervisor can start a VM with the same name openQA-SUT-8

It happens rarely but this is failing the shutdown check, what I have notice manually monitoring is that the new VM can have the same name openQA-SUT-8, but will have a different ID

Actions #8

Updated by favogt about 1 year ago

The os-autoinst doc says:

If you add multiple instances, be sure to assign a different `VIRSH_INSTANCE`
and `VIRSH_MAC`. For more details about the variables, checkout
[backend_vars.asciidoc](backend_vars.asciidoc).
Actions #9

Updated by tbaev about 1 year ago

  • Priority changed from Normal to Low
Actions #10

Updated by tbaev about 1 year ago

  • Assignee deleted (tbaev)

Latest test runs, don't have the issue. After discussion in daily meeting this is happening rear and will change priority to Low and we will continue observing.

Actions #11

Updated by JERiveraMoya almost 1 year ago

  • Tags deleted (qem, qe-yam-refinement)
  • Status changed from In Progress to New
  • Target version deleted (Current)

Moving to backlog to check in the future.

Actions #12

Updated by JERiveraMoya 10 months ago

  • Status changed from New to Rejected

Backlog clean-up.

Actions

Also available in: Atom PDF