I ran
qemu-system-x86_64 -m 8192 -snapshot -drive file=/dev/vda,if=virtio -drive file=/dev/vdb,if=virtio -drive file=/dev/vdc,if=virtio -drive file=/dev/vdd,if=virtio -drive file=/dev/vde,if=virtio -nographic -serial mon:stdio -smp 4
on OSD while setting the second number on each mount in /etc/fstab for vdb…e to 0 so the system actually booted but then a lot of journal errors cropped up so I stopped again. Maybe the available storage for "snapshot" was exhausted. Trying to simulate locally with
qemu-system-x86_64 -m 8192 -enable-kvm -snapshot -drive file=SLES15-SP6-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2,if=virtio -drive file=/dev/null,if=virtio -drive file=/dev/null,if=virtio -drive file=/dev/null,if=virtio -drive file=/dev/null,if=virtio -nographic -serial mon:stdio -smp 4
but this shows nothing usable after the initial grub loading. I guess the serial terminal isn't properly enabled.
Booted locally with graphical interface using
for i in vdb vdc vdd vde ; do qemu-img create -f qcow2 $i.qcow2 2G; done
qemu-system-x86_64 -m 8192 -enable-kvm -drive file=SLES15-SP6-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2,if=virtio -drive file=vdb.qcow2,if=virtio -drive file=vdc.qcow2,if=virtio -drive file=vdd.qcow2,if=virtio -drive file=vde.qcow2,if=virtio -smp 4
and then in /etc/default/grub enable "serial console" instead of "gfxterm" and GRUB_SERIAL_COMMAND="serial --unit=0 --speed=115200"
and called update-bootloader
. Also zypper ref && zypper -n in nfs-kernel-server
so that I can test with that.
Afterwards booted again with
qemu-system-x86_64 -m 8192 -enable-kvm -drive file=SLES15-SP6-Minimal-VM.x86_64-kvm-and-xen-GM.qcow2,if=virtio -drive file=vdb.qcow2,if=virtio -drive file=vdc.qcow2,if=virtio -drive file=vdd.qcow2,if=virtio -drive file=vde.qcow2,if=virtio -smp 4 -nographic -serial mon:stdio
Then copied over the content of /etc/exports and /etc/fstab from OSD, replacing UUIDs with my clean simulation devices vdb…e. I just added the mount point definitions but have not created a filesystem yet to simulate problems on those devies. On boot system was stuck in emergency mode, not entirely expected as we had "nofail" on all partitions. The error:
[ 4.393865][ T440] systemd-fstab-generator[440]: Failed to create unit file '/run/systemd/generator/srv.mount', as it already exists. Duplica?
[ 4.399227][ T434] (sd-execu[434]: /usr/lib/systemd/system-generators/systemd-fstab-generator failed with exit status 1.
I disabled the original mount for /srv with the btrfs subvolume. But I don't see kernel and systemd boot output anymore on the serial terminal. Don't know what broke. ok, the "quiet" flag was set so removed that from /etc/default/grub and called update-bootloader
.
Anyway, the system boots and eventually also shows a login prompt. In there it looks like nfs-server is not started because the mount point isn't populated so how it should be. Ok, but I never enabled nfs-server. Did that now and created filesystems and the complete directory structure to populate all mount points as on OSD. Enabled nfs-server, rebooted, all good. Now disabled the mount for /assets in /etc/fstab. I guess the problem is that the mount point for /assets /var/lib/openqa/share none x-systemd.requires=/var/lib/openqa,x-systemd.automount,bind,nofail 0 0
can work regardless if something is mounted on /assets or not. Let's see. So systemctl cat var-lib-openqa-share.mount
shows that we require var-lib-openqa.mount but not something like assets.mount. Should we also require that?
I did
/dev/vdc /assets xfs_foo defaults,logbsize=256k,noatime,nodiratime,nofail 1 2
/assets /var/lib/openqa/share none x-systemd.requires=/var/lib/openqa,x-systemd.automount,bind,nofail 0 0
so forcing a condition where a mount unit for /assets is defined but fails to mount as "xfs_foo" is purposely invalid. This caused the system to still boot up to the point of reaching a login prompt and started sshd but systemctl status nfs-server
shows that the NFS server did not start with message
Job nfs-server.service: Job nfs-server.service/start failed with result 'dependency'
which we wanted. Same for var-lib-openqa-share.mount . Doesn't say which dependency but ok. systemctl --failed
clearly shows that assets.mount
failed so it should be clear where to start investigating. Now, what about other devices, like vdb, vdd, vde? /srv/PSQL
also has no dependency on /srv as apparent in systemctl cat var-lib-pgsql.mount
. First simulating that all vdb…e fail with "xfs_foo". Got on systemctl --failed
# systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● assets.mount loaded failed failed /assets
● results.mount loaded failed failed /results
● space\x2dslow.mount loaded failed failed /space-slow
● srv.mount loaded failed failed /srv
so far as expected. And all dependant mount points are not mounted which is good. systemctl cat var-lib-pgsql.mount
for example says that it's inactive and mentions the missing dependency but systemctl cat …
does not explicitly list it so apparently the What=/srv/PSQL
is enough. Maybe we can actually remove some requires then? Looks like everything still works fine. I assume the original problem for this ticket happened because we went a bit back and forth disabling complete lines in fstab. So let's see what happens if I disable lines completely. Then nfs-server still does not start as the directory /var/lib/openqa/share does not exist. I installed openQA in my environment which actually created the directory structure which nfs-server would expect even though that is not the correct one. We could manually delete the directories in the unmounted directories.
What about
[Unit]
ConditionPathIsMountPoint=/var/lib/openqa/share
Now trying to start fails with
NFS server and services was skipped because of an unmet condition check (ConditionPathIsMountPoint=/var/lib/openqa/share).
that sounds promising. Enabling all mounts in /etc/fstab again. Triggered reboot and verified that this worked fine. So we can add that extra dependency I guess. Put into salt and tested with
salt --no-color --state-output=changes 'openqa.suse.de' state.test etc.master.init | grep -v 'Result: Clean'
salt --no-color --state-output=changes 'openqa.suse.de' state.apply etc.master.init | grep -v 'Result: Clean'
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1213