action #162284
closedPrevent multi-machine tests to be picked up if os-autoinst-openvswitch service does not work size:M
0%
Description
Observation¶
During unintended upgrade of worker31 and others to Leap 15.6 the network did only come up after 20m(!) for yet unknown reasons, see #157975. Additional problems were caused because os-autoinst-openvswitch timed out eventually but then openQA workers picked up and destroyed jobs happily with
backend died: Open vSwitch command 'set_vlan' with arguments 'tap37 121' failed: org.freedesktop.DBus.Error.ServiceUnknown: The name org.opensuse.os_autoinst.switch was not provided by any .service files
Fabian Vogt suggests: "Instead of relying on the systemd .service to be enabled and running you could add a dbus .service file with a SystemdService= key to make use of dbus autolaunch"
Acceptance criteria¶
- AC1: openQA workers do not pick up multi-machine tests if the os-autoinst openvswitch service is not available over DBUS
- AC2: openQA workers clearly communicate this problematic situation
Suggestions¶
- Make the worker check for the availability of the requested DBUS service or something if it has a multi-machine worker class and make the worker show up as "broken" which we already use in other cases
- As alternative look into the suggestion by fvogt but that would still rely on systemd
- As alternative to declaring the worker "broken" remove the "tap" class dynamically or change it to "tap_broken_$reason"
Updated by okurz 5 months ago
- Copied from action #157975: Upgrade osd workers to openSUSE Leap 15.6 added
Updated by okurz 3 months ago
- Related to action #165132: test fails in openqa_worker with 'No such timeout policy "ovs_test_tp"' and other problems regarding move to scripts/ added
Updated by tinita 3 months ago
- Subject changed from Prevent multi-machine tests to be picked up if os-autoinst-openvswitch service does not work to Prevent multi-machine tests to be picked up if os-autoinst-openvswitch service does not work size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan 2 months ago
- Related to action #165860: wicked_basic_ref fails on ppc64le: Open vSwitch command 'set_vlan' with arguments 'tap3 8' failed added
Updated by mkittler 2 months ago
- Status changed from Feedback to Resolved
The PR has been merged. I did a "fullstack testing" of the feature locally (besides the unit tests) so I think it'll work in production when os-autoinst-openvswitch won't be available. I also checked for false positives on o3 (where the change is already deployed) but haven't found any related broken workers.