action #162590
Updated by okurz 6 months ago
## Motivation
As observed in #162365 when mounts on OSD are not coming up properly and also because we don't want to treat them critical for boot anymore. What happens is likely that nfs-server.service on OSD starts on boot even if /var/lib/openqa/share is not mounted yet. Then clients, e.g. worker40 and others, are connected and served, then on OSD /var/lib/openqa/share is mounted over the already existing directory causing clients to misbehave as reported in https://suse.slack.com/archives/C02CANHLANP/p1718811455420459 by acarvajal. Probably we can ensure that nfs-server only starts up after /var/lib/openqa/share is completely available, either by explicit systemd unit requirements added or by providing the underlying mount points instead of bind mount directories
## Acceptance criteria
* **AC1:** `ls /var/lib/openqa/share/` lists content on OSD workers using that directory from NFS exports after OSD servers
## Suggestions
* Verify in production with a planned and monitored OSD reboot (after the according sibling ticket about xfs_repair OOM) *or* try to reproduce the problem on the server side on OSD with `qemu-system-x86_64 -m 8192 -snapshot -drive file=/dev/vda,if=virtio -drive file=/dev/vdb,if=virtio -drive file=/dev/vdc,if=virtio -drive file=/dev/vdd,if=virtio -drive file=/dev/vde,if=virtio -hda /dev/vda -hdb /dev/vdb -hdc /dev/vdc -hdd /dev/vdd -hde /dev/vde -nographic -serial mon:stdio -smp 4` and trying to access the NFS mount from within that VM. Or try to reproduce the problem in plain VMs
* Research upstream about NFS server and systemd units and dependencies on mount points
* See if we can ensure services start after mount points are accessible, specifically /var/lib/openqa/share before nfs-server is started, likely with a systemd override file adding a "RequiresMountFor", like https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1207/diffs#fb557f9ca291facc4d54992e48f7126c56c74208_442_448
Back