Project

General

Profile

action #162590

Updated by okurz 6 months ago

## Motivation 
 As observed in #162365 when mounts on OSD are not coming up properly and also because we don't want to treat them critical for boot anymore. What happens is likely that nfs-server.service on OSD starts on boot even if /var/lib/openqa/share is not mounted yet. Then clients, e.g. worker40 and others, are connected and served, then on OSD /var/lib/openqa/share is mounted over the already existing directory causing clients to misbehave as reported in https://suse.slack.com/archives/C02CANHLANP/p1718811455420459 by acarvajal. Probably we can ensure that nfs-server only starts up after /var/lib/openqa/share is completely available, either by explicit systemd unit requirements added or by providing the underlying mount points instead of bind mount directories 

 ## Acceptance criteria 
 * **AC1:** `ls /var/lib/openqa/share/` lists content on OSD workers using that directory from NFS exports after OSD servers 

 ## Suggestions 
 * Verify in production with a planned and monitored OSD reboot (after the according sibling ticket about xfs_repair OOM) *or* try to reproduce the problem on the server side on OSD with `qemu-system-x86_64 -m 8192 -snapshot -drive file=/dev/vda,if=virtio -drive file=/dev/vdb,if=virtio -drive file=/dev/vdc,if=virtio -drive file=/dev/vdd,if=virtio -drive file=/dev/vde,if=virtio -nographic -serial mon:stdio -smp 4` and trying to access the NFS mount from within that VM. Or try to reproduce the problem in plain VMs 
 * Research upstream about NFS server and systemd units and dependencies on mount points 
 * See if we can ensure services start after mount points are accessible, specifically /var/lib/openqa/share before nfs-server is started, likely with a systemd override file adding a "RequiresMountFor", like https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1207/diffs#fb557f9ca291facc4d54992e48f7126c56c74208_442_448  

 ## Rollback steps 
 * Remove alert silence `alertname=Failed systemd services` on https://stats.openqa-monitor.qa.suse.de/alerting/silences

Back