action #98835
openarm jobs failing (again?) with auto_review:"backend died: Open vSwitch command 'set_vlan' with arguments .*was not provided by any .service files":retry
0%
Description
Observation¶
https://openqa.suse.de/tests/7144739
incompletes with
Reason: backend died: Open vSwitch command 'set_vlan' with arguments 'tap18 39' failed: org.freedesktop.DBus.Error.ServiceUnknown: The name org.opensuse.os_autoinst.switch was not provided by any .service files
in the log file:
[2021-09-17T01:36:49.797 CEST] [debug] starting: /usr/bin/qemu-system-aarch64 -device virtio-gpu-pci -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -m 1024 -machine virt,usb=off,gic-version=3,its=off -cpu host -netdev tap,id=qanet0,ifname=tap18,script=no,downscript=no -device virtio-net,netdev=qanet0,mac=52:54:00:12:02:30 -boot menu=on,splash-time=5000 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1 -enable-kvm -no-shutdown -vnc :109,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/19/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on -device virtio-blk-device,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/19/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/19/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/19/raid/pflash-vars-overlay0,unit=1
[2021-09-17T01:36:49.826 CEST] [debug] Waiting for 0 attempts
[2021-09-17T01:36:50.830 CEST] [debug] Waiting for 1 attempts
[2021-09-17T01:36:51.832 CEST] [debug] Finished after 2 attempts
[2021-09-17T01:36:51.981 CEST] [debug] Establishing VNC connection to localhost:6009
[2021-09-17T01:36:51.991 CEST] [debug] pointer type 1 0 640 480 -257
[2021-09-17T01:36:51.991 CEST] [debug] led state 0 1 1 -261
[2021-09-17T01:36:52.117 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
Open vSwitch command 'set_vlan' with arguments 'tap18 39' failed: org.freedesktop.DBus.Error.ServiceUnknown: The name org.opensuse.os_autoinst.switch was not provided by any .service files
on openqaworker-arm-2:19. The machine (was) restarted since then. Right now I can see
$ ssh openqaworker-arm-2 "sudo ovs-vsctl show | grep tap18"
Port tap18
Interface tap18
which looks ok.
Steps to reproduce¶
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
for example to look for ticket 12345 call openqa-query-for-job-label poo#12345
Problem¶
Maybe inconsistent worker instance and openvswitch setup which resolved itself for now with the machine restart?
Suggestion¶
- Look into older related tickets #75274 and #88807 if there are suggestions we can follow or what improvements we can apply.
- Can we add a call to
ovs-vsctl show
into post_fail_hooks of tests?
Workaround¶
Likely retriggering works if jobs end up on other instances or machines
Updated by okurz about 3 years ago
- Target version set to future
Called export host=openqa.suse.de; ./openqa-monitor-incompletes | bash -e ./openqa-label-known-issues
from github.com/os-autoinst/scripts which retriggered according incompletes. This retriggered couple of jobs. Good enough for now.
Updated by okurz about 3 years ago
- Related to action #75274: [osd-admins][alert][learning] Failed systemd services alert (workers): os-autoinst-openvswitch.service aborts retries after 60s and is not easily configurable added
Updated by okurz about 3 years ago
- Related to action #88807: Open vSwitch command 'set_vlan' with arguments 'tap41 13' failed: 'tap41' is not connected to bridge 'br1' at /usr/lib/os-autoinst/backend/qemu.pm line 152. added
Updated by livdywan 4 months ago
- Related to action #165860: wicked_basic_ref fails on ppc64le: Open vSwitch command 'set_vlan' with arguments 'tap3 8' failed added