action #104350
Updated by okurz almost 3 years ago
## Observation From `journalctl -b -u os-autoinst-openvswitch.service` on grenache-1: ``` -- Logs begin at Mon 2021-12-20 21:50:34 CET, end at Sun 2021-12-26 14:26:34 CET. -- Dec 26 03:34:34 grenache-1 systemd[1]: Starting os-autoinst openvswitch helper... Dec 26 03:35:34 grenache-1 wicked[3515]: device br1: unable to apply configuration to nanny Dec 26 03:36:04 grenache-1 systemd[1]: os-autoinst-openvswitch.service: start-pre operation timed out. Terminating. Dec 26 03:36:04 grenache-1 systemd[1]: os-autoinst-openvswitch.service: Control process exited, code=killed, status=15/TERM Dec 26 03:36:04 grenache-1 systemd[1]: os-autoinst-openvswitch.service: Failed with result 'timeout'. Dec 26 03:36:04 grenache-1 systemd[1]: Failed to start os-autoinst openvswitch helper. Dec 26 04:38:08 grenache-1 systemd[1]: Starting os-autoinst openvswitch helper... Dec 26 04:38:08 grenache-1 systemd[1]: Started os-autoinst openvswitch helper. ``` this triggered an alert shortly as visible on https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=1640478716612&to=1640495459276 but as can be seen shortly after resolved itself roughly one hour later. ## Acceptance criteria * **AC1:** At least the code flow in https://github.com/os-autoinst/os-autoinst/blob/master/os-autoinst-openvswitch and the corresponding systemd service has been reviewed once ## Suggestions * Investigate why os-autoinst-openvswitch.service times out after 2.5m ~ 180s, when the config file in salt-states says 300s and https://github.com/os-autoinst/os-autoinst/blob/master/os-autoinst-openvswitch#L30 reads like there should be indefinite waiting time * Check if one of the related systemd units has retry. If not, add one or extend timeout * on grenache-1.qa there is already ``` # /etc/systemd/system/os-autoinst-openvswitch.service.d/override.conf [Service] ExecStartPre=/bin/sh -c 'command -v wicked >/dev/null && wicked ifstatus br1 | grep -q up || wicked ifup br1' ``` not sure if this is manually maintained or where this comes from