action #162590
Updated by livdywan 6 months ago
## Motivation As observed in #162365 when mounts on OSD are not coming up properly and also because we don't want to treat them critical for boot anymore. What happens is likely that nfs-server.service on OSD starts on boot even if /var/lib/openqa/share is not mounted yet. Then clients, e.g. worker40 and others, are connected and served, then on OSD /var/lib/openqa/share is mounted over the already existing directory causing clients to misbehave as reported in https://suse.slack.com/archives/C02CANHLANP/p1718811455420459 by acarvajal. Probably we can ensure that nfs-server only starts up after /var/lib/openqa/share is completely available, either by explicit systemd unit requirements added or by providing the underlying mount points instead of bind mount directories ## Acceptance criteria * **AC1:** `ls /var/lib/openqa/share/` lists content on OSD workers using that directory from NFS exports after OSD servers ## Suggestions * Verify in production with a planned and monitored OSD reboot (after the according sibling ticket about xfs_repair OOM) *or* try Try to reproduce the problem on the server side on OSD with `qemu-system-x86_64 -m 8192 -snapshot -hda /dev/vda -hdb /dev/vdb -hdc /dev/vdc -hdd /dev/vdd -hde /dev/vde -nographic -serial mon:stdio -smp 4` and trying to access the NFS mount from within that VM. Or try to reproduce the problem in plain VMs * Research upstream about NFS server and systemd units and dependencies on mount points * See if we can ensure services start after mount points are accessible, specifically /var/lib/openqa/share before nfs-server is started, likely with a systemd override file adding a "RequiresMountFor", like https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1207/diffs#fb557f9ca291facc4d54992e48f7126c56c74208_442_448