Project

General

Profile

Actions

action #116257

open

[virtualization][svirt] Some workers in openqaworker2 time out while copying the assets in bootloader_svirt module

Added by jlausuch over 1 year ago. Updated about 1 month ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2022-09-06
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-12-SP5-JeOS-for-kvm-and-xen-Updates-x86_64-jeos-extratest@svirt-xen-hvm fails in
bootloader_svirt

It hits the MAX_JOB_TIMEOUT while trying to copy the image.

The affected workers are:
openqaworker2:9
openqaworker2:10
openqaworker2:16

Most jobs using these workers time out during this step. Other examples:
https://openqa.suse.de/tests/9459036
https://openqa.suse.de/tests/9459031
https://openqa.suse.de/tests/9459037
https://openqa.suse.de/tests/9459064
https://openqa.suse.de/tests/9459069

Reproducible

Fails since (at least) Build 20220905-1 (current job)

Expected result

Last good: 20220903-1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #116644: [qe-core][functional][sle15sp5]test fails in bootloader_svirt, the test is using different network bridge 'ovs-system' rather than 'br0'Resolvedrfan12022-09-16

Actions
Actions #1

Updated by jlausuch over 1 year ago

I have the following in openqaworker2:

systemctl mask --now openqa-worker-auto-restart@{9,10,16}.service
systemctl mask --now openqa-reload-worker-auto-restart@{9,10,16}.service
systemctl mask --now openqa-reload-worker-auto-restart@{9,10,16}.path
Actions #2

Updated by jlausuch over 1 year ago

I tried to bring it up again and the job https://openqa.suse.de/tests/9459087 still fails. So it needs more investigation.

Actions #3

Updated by jlausuch over 1 year ago

I have rebooted openqaw5-xen machine and enabled systemctl enable libvirtd. I also have restarted the 3 workers in openqaworker2 host.

Now at least, the bootloader_svirt module passes that step, but fails in another place:
https://openqa.suse.de/tests/9459108#step/bootloader_svirt/31
https://openqa.suse.de/tests/9459106#step/bootloader_svirt/33
https://openqa.suse.de/tests/9459107#step/bootloader_svirt/31

virsh start failed at /usr/lib/os-autoinst/consoles/sshVirtsh.pm line 546.
   at /usr/lib/os-autoinst/backend/console_proxy.pm line 46.
    backend::console_proxy::__ANON__(undef) called at sle/tests/installation/bootloader_svirt.pm line 282
    bootloader_svirt::run(bootloader_svirt=HASH(0x55b912784ae8)) called at /usr/lib/os-autoinst/basetest.pm line 328
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 322
    basetest::runtest(bootloader_svirt=HASH(0x55b912784ae8)) called at /usr/lib/os-autoinst/autotest.pm line 367
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 367
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 243
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 243
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 294
    autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x55b914915538)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
    Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x55b914915538), CODE(0x55b914a8e628)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
    Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x55b914915538)) called at /usr/lib/os-autoinst/autotest.pm line 296
    autotest::start_process() called at /usr/bin/isotovideo line 273
Actions #4

Updated by jlausuch over 1 year ago

Difference between passed job:

      <interface type="network">
        <mac address="00:16:3e:68:97:43"/>
        <model type="netfront"/>
        <source network="br0"/>
        <virtualport type="openvswitch"/>
      </interface>

and failed job:

      <interface type="bridge">
        <mac address="00:16:3e:79:99:8b"/>
        <virtualport type="openvswitch"/>
        <source bridge="ovs-system"/>
        <model type="netfront"/>
      </interface>

For some reason, jobs are taking <source bridge="ovs-system"/> instead of <source network="br0"/>.
I have checked openvswitch.service and it's active, br0 is there and active too.
Not sure what's going on.

Actions #5

Updated by jlausuch over 1 year ago

Trying to run steps manually:

openqaw5-xen:~ # virsh define /var/lib/libvirt/images/openQA-SUT-2.xml
Domain openQA-SUT-2 defined from /var/lib/libvirt/images/openQA-SUT-2.xml

openqaw5-xen:~ # virsh  start openQA-SUT-2
error: Failed to start domain openQA-SUT-2
error: internal error: libxenlight failed to create new domain 'openQA-SUT-2'

Maybe this host doesn't have VT-x enabled after reboot?

openqaw5-xen:~ # virt-host-validate 
  QEMU: Checking for hardware virtualization                                 : FAIL (Only emulated CPUs are available, performance will be significantly limited)
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
WARN (Unknown if this platform has IOMMU support)
  QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)
   LXC: Checking for Linux >= 2.6.26                                         : PASS
   LXC: Checking for namespace ipc                                           : PASS
   LXC: Checking for namespace mnt                                           : PASS
   LXC: Checking for namespace pid                                           : PASS
   LXC: Checking for namespace uts                                           : PASS
   LXC: Checking for namespace net                                           : PASS
   LXC: Checking for namespace user                                          : PASS
   LXC: Checking for cgroup 'cpu' controller support                         : PASS
   LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
   LXC: Checking for cgroup 'cpuset' controller support                      : PASS
   LXC: Checking for cgroup 'memory' controller support                      : PASS
   LXC: Checking for cgroup 'devices' controller support                     : PASS
   LXC: Checking for cgroup 'freezer' controller support                     : PASS
   LXC: Checking for cgroup 'blkio' controller support                       : PASS
   LXC: Checking if device /sys/fs/fuse/connections exists                   : FAIL (Load the 'fuse' module to enable /proc/ overrides)

But if it was working before, why this shouldn't survive a simple reboot?

Actions #6

Updated by jlausuch over 1 year ago

ovs-vsctl show
bc0b58fc-7882-4a8c-ba39-69ffc77671b0
    Manager "--help"
    Bridge br0
        Port vif2.0
            Interface vif2.0
        Port vif2.0-emu
            Interface vif2.0-emu
                error: "could not open network device vif2.0-emu (No such device)"
        Port br0
            Interface br0
                type: internal
        Port eth0
            Interface eth0
    ovs_version: "2.13.2"

openqaw5-xen:~ # virsh iface-list --all
 Name         State    MAC Address
------------------------------------
 ovs-system   active   

openqaw5-xen:~ # virsh iface-dumpxml ovs-system
error: An error occurred, but the cause is unknown
Actions #7

Updated by rfan1 over 1 year ago

  • Related to action #116644: [qe-core][functional][sle15sp5]test fails in bootloader_svirt, the test is using different network bridge 'ovs-system' rather than 'br0' added
Actions #8

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: jeos-extratest@svirt-xen-hvm
https://openqa.suse.de/tests/9469862#step/bootloader_svirt/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #9

Updated by slo-gin over 1 year ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #10

Updated by slo-gin over 1 year ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #11

Updated by okurz over 1 year ago

  • Subject changed from [svirt] Some workers in openqaworker2 time out while copying the assets in bootloader_svirt module to [virtualization][svirt] Some workers in openqaworker2 time out while copying the assets in bootloader_svirt module
Actions #12

Updated by slo-gin over 1 year ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #13

Updated by slo-gin over 1 year ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #14

Updated by slo-gin about 1 year ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #15

Updated by slo-gin about 1 month ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions

Also available in: Atom PDF