Project

General

Profile

Actions

action #126647

closed

[qe-core] test fails in bootloader_start - we should use br0 not ovs-system

Added by hjluo about 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2023-03-27
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

while fix issue on sle-15-SP5-Online-x86_64-guided_ext4@svirt-xen-pv, the test fails in
bootloader_start
and complained with:

Test suite description

Guided Partitioning installation with ext4 filesystem.

error message:

# Test died: {
  "cmd" => "backend_proxy_console_call",
  "json_cmd_token" => "lZptbtVI",
  "wantarray" => undef,
  "console" => "svirt",
  "function" => "define_and_start",
  "args" => []
}
virsh start failed: 1

virsh domain XML:
<domain type='xen'>
  <name>openQA-SUT-8</name>
  <uuid>6a5cd9cb-74d7-4780-b4b9-93734a361337</uuid>
  <description>openQA WebUI: openqa.suse.de (8): 10796948-sle-15-SP5-Online-x86_64-Buildhjluo_os-autoinst-distri-opensuse_stop_time_xen-guided_ext4@hjluo_os-autoinst-distri-opensuse_stop_time_xen@svirt-xen-pv</description>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64'>linux</type>
    <kernel>/usr/lib/grub2/x86_64-xen/grub.xen</kernel>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/SLE-15-SP5-Online-x86_64-Build81.1-Media1.iso'/>
      <target dev='sda' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='unsafe'/>
      <source file='/var/lib/libvirt/images/openQA-SUT-8b.img'/>
      <target dev='xvdb' bus='xen'/>
    </disk>
    <controller type='xenbus' index='0'/>
    <controller type='scsi' index='0'/>
    <interface type='bridge'>
      <mac address='00:16:3e:7d:4b:b7'/>
      <source bridge='ovs-system'/>
      <virtualport type='openvswitch'>  <======

Reproducible

Fails since (at least) Build hjluo/os-autoinst-distri-opensuse#stop_time_xen (current job)

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest

we have this kind of ticket before:
https://progress.opensuse.org/issues/116644


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #125783: [jeos] Test fails in kdump_and_crash on SLE 12sp5 and 15sp4 XEN after worker migration from SLES to Leap 15.4Resolvedokurz2023-03-10

Actions
Related to openQA Infrastructure - action #127097: [alert] Failed systemd services alertResolvedmkittler2023-04-03

Actions
Actions #1

Updated by szarate about 1 year ago

  • Related to action #125783: [jeos] Test fails in kdump_and_crash on SLE 12sp5 and 15sp4 XEN after worker migration from SLES to Leap 15.4 added
Actions #2

Updated by hjluo about 1 year ago

  • Target version set to QE-Core: Ready
Actions #3

Updated by rfan1 about 1 year ago

  • Status changed from New to Feedback
  • Assignee set to rfan1

I just destroyed the iface as a temporary workaround:

 # virsh iface-destroy ovs-system
Interface ovs-system destroyed

Actions #4

Updated by okurz about 1 year ago

  • Subject changed from test fails in bootloader_start - we should use br0 not ovs-system to [qe-core] test fails in bootloader_start - we should use br0 not ovs-system
Actions #5

Updated by rfan1 about 1 year ago

I added the systemd service to stop the ovs-system iface if it is active:

 # cat /etc/systemd/system/stop-iface-ovs-system.service
[Unit]
Description=stop iface 'ovs-system'
Requires=libvirtd.service
After=libvirtd.service

[Service]
ExecStart=/bin/bash /usr/sbin/stop-iface-ovs-system.sh

[Install]
WantedBy=multi-user.target

 # cat /usr/sbin/stop-iface-ovs-system.sh
#!/bin/bash
set -x
virsh iface-list --all | grep -w active | awk '{ print $1 }' | grep ovs-system 
if [ $? -eq 0 ]; then
    virsh iface-destroy ovs-system && echo "stopped ovs-system" > /tmp/destroy_ovs-system
else
   echo "ovs-system is not active on this host" > /tmp/destroy_ovs-system   
fi

Actions #6

Updated by rfan1 about 1 year ago

  • Related to action #127097: [alert] Failed systemd services alert added
Actions #7

Updated by rfan1 about 1 year ago

There are some coding style issue which caused non-zero return code even the iface is destroyed.

I have fixed it:

# cat /usr/sbin/stop-iface-ovs-system.sh
#!/bin/bash
# poo#126647
set -x
virsh iface-list --all | grep -w active | awk '{ print $1 }' | grep ovs-system 
if [ $? -eq 0 ]; then
    virsh iface-destroy ovs-system
    echo "stopped ovs-system" > /tmp/destroy_ovs-system
else
   echo "ovs-system is not active on this host" > /tmp/destroy_ovs-system   
fi

 # /usr/sbin/stop-iface-ovs-system.sh
+ virsh iface-list --all
+ grep -w active
+ grep ovs-system
+ awk '{ print $1 }'
+ '[' 1 -eq 0 ']'
+ echo 'ovs-system is not active on this host'
 # echo $?
0
 # virsh iface-start ovs-system 
Interface ovs-system started
 # /usr/sbin/stop-iface-ovs-system.sh
+ virsh iface-list --all
+ grep -w active
+ grep ovs-system
+ awk '{ print $1 }'
ovs-system
+ '[' 0 -eq 0 ']'
+ virsh iface-destroy ovs-system
Interface ovs-system destroyed

+ echo 'stopped ovs-system'
# echo $?
0
Actions #8

Updated by rfan1 about 1 year ago

  • Status changed from Feedback to Resolved
Actions #9

Updated by rfan1 10 months ago

  • Status changed from Resolved to Feedback
Actions #10

Updated by nicksinger 10 months ago

  • Status changed from Feedback to New

As discussed in slack (https://suse.slack.com/archives/C02CANHLANP/p1687167780103479?thread_ts=1687166022.818889&cid=C02CANHLANP) this systemd unit is just a workaround to reduce priority. Please ensure that the correct interface is used in the first place.

Actions #11

Updated by rfan1 10 months ago

  • Status changed from New to In Progress
Actions #13

Updated by rfan1 10 months ago

  • Tags set to bugbusters
Actions #14

Updated by rfan1 10 months ago

  • Status changed from In Progress to Feedback

PR is merged,

I will wait for next xen server reboot to check if it can work fine.
Then I will try to delete the systemd service I added before.

Actions #15

Updated by rfan1 10 months ago

  • Status changed from Feedback to Resolved

Removed the old service:

# rm /etc/systemd/system/stop-iface-ovs-system.service
# rm /usr/sbin/stop-iface-ovs-system.sh
# systemctl daemon-reload 
Actions

Also available in: Atom PDF