Project

General

Profile

Actions

action #98835

open

arm jobs failing (again?) with auto_review:"backend died: Open vSwitch command 'set_vlan' with arguments .*was not provided by any .service files":retry

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2021-09-17
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/7144739
incompletes with

Reason: backend died: Open vSwitch command 'set_vlan' with arguments 'tap18 39' failed: org.freedesktop.DBus.Error.ServiceUnknown: The name org.opensuse.os_autoinst.switch was not provided by any .service files 

in the log file:

[2021-09-17T01:36:49.797 CEST] [debug] starting: /usr/bin/qemu-system-aarch64 -device virtio-gpu-pci -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -m 1024 -machine virt,usb=off,gic-version=3,its=off -cpu host -netdev tap,id=qanet0,ifname=tap18,script=no,downscript=no -device virtio-net,netdev=qanet0,mac=52:54:00:12:02:30 -boot menu=on,splash-time=5000 -device nec-usb-xhci -device usb-tablet -device usb-kbd -smp 1 -enable-kvm -no-shutdown -vnc :109,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console1,path=virtio_console1,logfile=virtio_console1.log,logappend=on -device virtconsole,chardev=virtio_console1,name=org.openqa.console.virtio_console1 -chardev socket,path=qmp_socket,server,nowait,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -blockdev driver=file,node-name=hd0-overlay0-file,filename=/var/lib/openqa/pool/19/raid/hd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on -device virtio-blk-device,id=hd0-device,drive=hd0-overlay0,bootindex=0,serial=hd0 -blockdev driver=file,node-name=cd0-overlay0-file,filename=/var/lib/openqa/pool/19/raid/cd0-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=cd0-overlay0,file=cd0-overlay0-file,cache.no-flush=on -device scsi-cd,id=cd0-device,drive=cd0-overlay0,serial=cd0 -drive id=pflash-code-overlay0,if=pflash,file=/var/lib/openqa/pool/19/raid/pflash-code-overlay0,unit=0,readonly=on -drive id=pflash-vars-overlay0,if=pflash,file=/var/lib/openqa/pool/19/raid/pflash-vars-overlay0,unit=1
[2021-09-17T01:36:49.826 CEST] [debug] Waiting for 0 attempts
[2021-09-17T01:36:50.830 CEST] [debug] Waiting for 1 attempts
[2021-09-17T01:36:51.832 CEST] [debug] Finished after 2 attempts
[2021-09-17T01:36:51.981 CEST] [debug] Establishing VNC connection to localhost:6009
[2021-09-17T01:36:51.991 CEST] [debug] pointer type 1 0 640 480 -257
[2021-09-17T01:36:51.991 CEST] [debug] led state 0 1 1 -261
[2021-09-17T01:36:52.117 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  Open vSwitch command 'set_vlan' with arguments 'tap18 39' failed: org.freedesktop.DBus.Error.ServiceUnknown: The name org.opensuse.os_autoinst.switch was not provided by any .service files

on openqaworker-arm-2:19. The machine (was) restarted since then. Right now I can see

$ ssh openqaworker-arm-2 "sudo ovs-vsctl show | grep tap18"
        Port tap18
            Interface tap18

which looks ok.

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
for example to look for ticket 12345 call openqa-query-for-job-label poo#12345

Problem

Maybe inconsistent worker instance and openvswitch setup which resolved itself for now with the machine restart?

Suggestion

  • Look into older related tickets #75274 and #88807 if there are suggestions we can follow or what improvements we can apply.
  • Can we add a call to ovs-vsctl show into post_fail_hooks of tests?

Workaround

Likely retriggering works if jobs end up on other instances or machines


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #75274: [osd-admins][alert][learning] Failed systemd services alert (workers): os-autoinst-openvswitch.service aborts retries after 60s and is not easily configurableResolvedlivdywan2020-12-04

Actions
Related to openQA Infrastructure - action #88807: Open vSwitch command 'set_vlan' with arguments 'tap41 13' failed: 'tap41' is not connected to bridge 'br1' at /usr/lib/os-autoinst/backend/qemu.pm line 152. Resolvedmkittler2021-02-19

Actions
Actions #1

Updated by okurz over 2 years ago

  • Target version set to future

Called export host=openqa.suse.de; ./openqa-monitor-incompletes | bash -e ./openqa-label-known-issues from github.com/os-autoinst/scripts which retriggered according incompletes. This retriggered couple of jobs. Good enough for now.

Actions #2

Updated by okurz over 2 years ago

  • Related to action #75274: [osd-admins][alert][learning] Failed systemd services alert (workers): os-autoinst-openvswitch.service aborts retries after 60s and is not easily configurable added
Actions #3

Updated by okurz over 2 years ago

  • Related to action #88807: Open vSwitch command 'set_vlan' with arguments 'tap41 13' failed: 'tap41' is not connected to bridge 'br1' at /usr/lib/os-autoinst/backend/qemu.pm line 152. added
Actions

Also available in: Atom PDF