action #59300

auto_review:"DBus.*: The name org.opensuse.os_autoinst.switch was not provided" GRE tunnel settings not applied on initial setup / after reboot

Added by okurz 5 months ago. Updated 4 months ago.

Status:WorkableStart date:11/11/2019
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Duration:

Description

Observation

See #32605#note-13 :
https://openqa.suse.de/tests/3516989# failed on openqaworker10, the parallel ref was running on openqaworker8. Seems to be a problem with GRE tunnel. I can see that the config files have been correctly applied on openqaworker10 in files but not in the active config. wicked ifup br1 fixed this as I can see GRE config in ovs-vsctl show now. salt-states-openqa mentions a wicked ifup br1 correctly so I am not sure if this is a generic problem or a single incident linked to incorrect application of salt state on the worker. Probably even a reboot would have fixed it the same. I should see how the system behaves during a reboot.

Reproducible

To be confirmed, e.g. in a clean VM test install.

Problem

https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/openvswitch.sls#L17 should call wicked ifup br1 but maybe this did not work when the GRE tunnel config was applied. But also the same or similar problem can happen on subsequent reboot, e.g. as reported in #59300#note-3

Suggestions

  • Apply salt states in a clean environment, e.g. new worker install or test VM
  • Call ovs-vsctl show and look if any entries for GRE remote IPs show up, e.g.
sudo ovs-vsctl show | grep -B 3 'options.*remote_ip'
        Port "gre8"
            Interface "gre8"
                type: gre
                options: {remote_ip="10.160.1.20"}

Workaround

Call wicked ifup br1 or whatever is the according bridge name manually after the openvswitch config has been applied.

History

#1 Updated by okurz 4 months ago

  • Description updated (diff)

#2 Updated by okurz 4 months ago

I just reproduced this problem on arm1 and arm3 where I added the "tap" worker class and bridge_ip to worker settings. salt recipes have been applied. But it needed one additional wicked ifup br1 on both machines.

#3 Updated by okurz 4 months ago

  • Subject changed from GRE tunnel settings seems to be not applied on initial setup, wicked ifup br1 is in salt but maybe not called to GRE tunnel settings seems to be not applied on initial setup and sporadically after reboot, wicked ifup br1 is in salt but maybe not called

Problem observed by @sebchlad "have you seen problems like this? https://openqa.suse.de/tests/3710031 ". Test jobs failed with "org.freedesktop.DBus.Error.ServiceUnknown: The name org.opensuse.os_autoinst.switch was not provided by any .service files" as the service os-autoinst-openvswitch on openqaworker-arm-1 failed after bootup and we do not have any alerts to handle this. So as we looked over it together it seems after the machine openqaworker-arm-1 was automatically rebooted because it crashed it needed a "wicked ifup br1 && systemctl restart os-autoinst-openvswitch" . I have the suspicion that https://github.com/os-autoinst/os-autoinst/blob/master/systemd/os-autoinst-openvswitch.service.in is missing some dependency . What service should ensure "wicked ifup br1" really worked?

From logs on openqaworker-arm-1, journalctl -b:

Dec 13 11:04:25 openqaworker-arm-1 wickedd-nanny[1676]: device br1: failed to bind services and methods for waitDeviceReady()
Dec 13 11:04:28 openqaworker-arm-1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Dec 13 11:04:28 openqaworker-arm-1 ovs-vsctl[2136]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --may-exist add-port br1 tap0
Dec 13 11:04:28 openqaworker-arm-1 ovs-vsctl[2138]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --may-exist add-port br1 tap1
Dec 13 11:04:28 openqaworker-arm-1 kernel: nicvf 0002:01:00.1 eth0: Link is Up 10000 Mbps Full duplex
Dec 13 11:04:28 openqaworker-arm-1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

It seems especially on arm eth takes longer to come up so the bridge is configured before eth0 is up.

I am thinking of putting a workaround in the os-autoinst-openvswitch service override file with wicked ifstatus br1 | grep -q up || wicked ifup br1 as ExecStartPre=/bin/sh -c 'command -v wicked >/dev/null && wicked ifstatus br1 | grep -q up || wicked ifup br1'

-> https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/238/

#4 Updated by okurz 4 months ago

  • Subject changed from GRE tunnel settings seems to be not applied on initial setup and sporadically after reboot, wicked ifup br1 is in salt but maybe not called to auto_review:"DBus.*: The name org.opensuse.os_autoinst.switch was not provided" GRE tunnel settings not applied on initial setup / after reboot
  • Description updated (diff)

Also available in: Atom PDF