action #162365
closedOSD can fail on xfs_repair OOM conditions size:S
0%
Description
Updated by okurz 5 months ago
- Copied from action #162362: 2024-06-15 osd not accessible - ensure healthy filesystems size:S added
Updated by jbaier_cz 5 months ago
From what I could find out, the on-boot checks are triggered by systemd-fsck@.service
which simply invokes fsck.xfs
for the problematic filesystems. According to man however
XFS is a journaling filesystem and performs recovery at mount(8) time if necessary, so fsck.xfs simply exits with a zero exit status. However, the system administrator can force fsck.xfs to run xfs_repair(8) at boot time by creating a /forcefsck file or booting the system with "fsck.mode=force" on the kernel command line.
So there are no on-boot checks for our xfs filesystem unless someone asks for them explicitly. That might be probably the case this time as suggested in the log.
Jun 16 10:46:51 openqa systemd[1]: Starting File System Check on /dev/disk/by-uuid/b5377fcf-6273-4f38-a471-cefff08c60b7...
Jun 16 10:46:51 openqa systemd[1]: Starting File System Check on /dev/disk/by-uuid/0b735e2d-b300-4a37-b39b-65fdc41d1ad1...
Jun 16 10:46:51 openqa systemd[1]: Starting File System Check on /dev/disk/by-uuid/40c5b7dd-487f-442b-ba95-b6186b18f3a7...
Jun 16 10:46:51 openqa systemd-fsck[715]: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Jun 16 10:46:51 openqa systemd[1]: Starting File System Check on /dev/disk/by-uuid/37972261-7411-44eb-bae9-90d1cc1c9a38...
Jun 16 10:46:51 openqa systemd-fsck[717]: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Jun 16 10:46:51 openqa systemd-fsck[720]: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Jun 16 10:46:51 openqa systemd-fsck[722]: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Jun 16 10:46:51 openqa systemd-fsck[735]: /sbin/fsck.xfs error: The filesystem log is dirty, mount it to recover the log. If that fails, refer to the section DIRTY LOGS in the xfs_repair manual page.
Jun 16 10:46:51 openqa systemd-fsck[717]: fsck failed with exit status 4.
Jun 16 10:46:51 openqa systemd-fsck[717]: Running request emergency.target/start/replace
Jun 16 10:46:51 openqa systemd[1]: systemd-fsck@dev-disk-by\x2duuid-0b735e2d\x2db300\x2d4a37\x2db39b\x2d65fdc41d1ad1.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 10:46:51 openqa systemd[1]: systemd-fsck@dev-disk-by\x2duuid-0b735e2d\x2db300\x2d4a37\x2db39b\x2d65fdc41d1ad1.service: Failed with result 'exit-code'.
Jun 16 10:46:51 openqa systemd[1]: Failed to start File System Check on /dev/disk/by-uuid/0b735e2d-b300-4a37-b39b-65fdc41d1ad1.
Jun 16 10:46:51 openqa systemd[1]: Dependency failed for /space-slow.
Jun 16 10:46:51 openqa systemd[1]: Dependency failed for Local File Systems.
Jun 16 10:46:51 openqa systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jun 16 10:46:51 openqa systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jun 16 10:46:51 openqa systemd[1]: space\x2dslow.mount: Job space\x2dslow.mount/start failed with result 'dependency'.
Jun 16 10:46:51 openqa systemd[1]: Finished File System Check on /dev/disk/by-uuid/40c5b7dd-487f-442b-ba95-b6186b18f3a7.
Jun 16 10:46:51 openqa systemd[1]: Mounting /srv...
Jun 16 10:47:03 openqa systemd[1]: Finished File System Check on /dev/disk/by-uuid/37972261-7411-44eb-bae9-90d1cc1c9a38.
Jun 16 10:47:03 openqa systemd[1]: Mounting /assets...
Unfortunately, the machine was rebooted again before the checks were completed.
The next run was also unsuccessful due to the OOM
Jun 16 10:53:29 openqa kernel: xfs_repair invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jun 16 10:53:29 openqa kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-systemd\x2dfsck.slice,task=xfs_repair,pid=736,uid=0
Jun 16 10:53:29 openqa kernel: Out of memory: Killed process 736 (xfs_repair) total-vm:22753620kB, anon-rss:22335672kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:44040kB oom_score_adj:0
Jun 16 10:53:31 openqa kernel: oom_reaper: reaped process 736 (xfs_repair), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Jun 16 10:53:31 openqa systemd-fsck[729]: /sbin/fsck.xfs: line 69: 736 Killed xfs_repair -e $DEV
Jun 16 10:53:31 openqa systemd-fsck[729]: /sbin/fsck.xfs error: An unknown return code from xfs_repair '137'
Jun 16 10:53:31 openqa systemd-fsck[715]: fsck failed with exit status 4.
Jun 16 10:53:31 openqa systemd[1]: systemd-fsck@dev-disk-by\x2duuid-b5377fcf\x2d6273\x2d4f38\x2da471\x2dcefff08c60b7.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 10:53:31 openqa systemd[1]: systemd-fsck@dev-disk-by\x2duuid-b5377fcf\x2d6273\x2d4f38\x2da471\x2dcefff08c60b7.service: Failed with result 'exit-code'.
Jun 16 10:53:31 openqa systemd[1]: Failed to start File System Check on /dev/disk/by-uuid/b5377fcf-6273-4f38-a471-cefff08c60b7.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for /results.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for Local File Systems.
Jun 16 10:53:31 openqa systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jun 16 10:53:31 openqa systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for /var/lib/openqa.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for var-lib-openqa-share.automount.
Jun 16 10:53:31 openqa systemd[1]: var-lib-openqa-share.automount: Job var-lib-openqa-share.automount/start failed with result 'dependency'.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for var-lib-openqa-share-factory-hdd-fixed.automount.
Jun 16 10:53:31 openqa systemd[1]: var-lib-openqa-share-factory-hdd-fixed.automount: Job var-lib-openqa-share-factory-hdd-fixed.automount/start failed with result 'dependency'.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for var-lib-openqa-archive.automount.
Jun 16 10:53:31 openqa systemd[1]: var-lib-openqa-archive.automount: Job var-lib-openqa-archive.automount/start failed with result 'dependency'.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for /var/lib/openqa/share.
Jun 16 10:53:31 openqa systemd[1]: Dependency failed for var-lib-openqa-share-factory-iso-fixed.automount.
Jun 16 10:53:31 openqa systemd[1]: var-lib-openqa-share-factory-iso-fixed.automount: Job var-lib-openqa-share-factory-iso-fixed.automount/start failed with result 'dependency'.
Jun 16 10:53:31 openqa systemd[1]: var-lib-openqa-share.mount: Job var-lib-openqa-share.mount/start failed with result 'dependency'.
Jun 16 10:53:31 openqa systemd[1]: var-lib-openqa.mount: Job var-lib-openqa.mount/start failed with result 'dependency'.
Jun 16 10:53:31 openqa systemd[1]: results.mount: Job results.mount/start failed with result 'dependency'.
The machine was again rebooted before the last running check was able to complete. This is also the case for a few (5) subsequent boots.
Jun 16 11:41:34 openqa kernel: Linux version 5.14.21-150500.55.65-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.41.0.20230908-150100.7.46) #1 SMP PREEMPT_DYNAMIC Thu May 23 04:57:11 UTC 2024 (a46829d)
Jun 16 11:59:35 openqa kernel: Linux version 5.14.21-150500.55.65-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.41.0.20230908-150100.7.46) #1 SMP PREEMPT_DYNAMIC Thu May 23 04:57:11 UTC 2024 (a46829d)
Jun 17 09:00:03 openqa kernel: Linux version 5.14.21-150500.55.65-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.41.0.20230908-150100.7.46) #1 SMP PREEMPT_DYNAMIC Thu May 23 04:57:11 UTC 2024 (a46829d)
Jun 17 09:03:15 openqa kernel: Linux version 5.14.21-150500.55.65-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.41.0.20230908-150100.7.46) #1 SMP PREEMPT_DYNAMIC Thu May 23 04:57:11 UTC 2024 (a46829d)
Jun 17 09:33:03 openqa kernel: Linux version 5.14.21-150500.55.65-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.41.0.20230908-150100.7.46) #1 SMP PREEMPT_DYNAMIC Thu May 23 04:57:11 UTC 2024 (a46829d)
There is a change in the subsequent boot when for some reason the main filesystem remained read-only which introduced more service failures
Jun 17 09:45:55 openqa kernel: Linux version 5.14.21-150500.55.65-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.41.0.20230908-150100.7.46) #1 SMP PREEMPT_DYNAMIC Thu May 23 04:57:11 UTC 2024 (a46829d)
Jun 17 09:45:58 openqa systemd-random-seed[638]: Failed to remove extended attribute, ignoring: Read-only file system
Jun 17 09:46:17 openqa systemd-tmpfiles[1157]: rm_rf(/tmp): Read-only file system
...
Jun 17 09:46:18 openqa (genrules)[1169]: augenrules.service: Failed to set up mount namespacing: /run/systemd/unit-root/dev: Read-only file system
Jun 17 09:46:18 openqa (genrules)[1169]: augenrules.service: Failed at step NAMESPACE spawning /sbin/augenrules: Read-only file system
Jun 17 09:46:18 openqa systemd[1]: Failed to start auditd rules generation.
Jun 17 09:46:18 openqa wickedd[1255]: unable to create unique tempfile in /tmp
Jun 17 09:46:18 openqa wickedd[1255]: __ni_objectmodel_write_message: unable to create tempfile for script arguments
Jun 17 09:46:18 openqa dbus-daemon[1170]: [system] Rejected send message, 0 matched rules; type="error", sender=":1.8" (uid=0 pid=1255 comm="/usr/sbin/wickedd --systemd --foreground ") interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.Failed" requested_rep>
Jun 17 09:46:18 openqa wickedd-nanny[1259]: device eth0: call to org.opensuse.Network.Firewall.firewallUp() failed: Invalid arguments
Jun 17 09:46:18 openqa wickedd-nanny[1259]: eth0: failed to bring up device, still continuing
...
Jun 17 09:46:24 openqa systemd[1]: Failed to start PostgreSQL database server.
Jun 17 09:46:24 openqa systemd[1]: openqa-gru.service: Job openqa-gru.service/start failed with result 'dependency'.
Jun 17 09:46:24 openqa nrpe[1474]: Starting up daemon
Jun 17 09:46:24 openqa systemd[1]: Dependency failed for Setup local PostgreSQL database for openQA.
...
Jun 17 09:46:25 openqa nfsdcltrack[1529]: Unexpected error when checking access on /var/lib/nfs/nfsdcltrack: Read-only file system
Jun 17 09:46:25 openqa kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
...
Jun 17 09:46:27 openqa salt-master[1368]: [ERROR ] An un-handled exception was caught by Salt's global exception handler:
Jun 17 09:46:27 openqa salt-master[1368]: OSError: [Errno 30] Read-only file system: '/var/cache/salt/master/.root_key'
Jun 17 09:46:27 openqa salt-master[1368]: Traceback (most recent call last):
Jun 17 09:46:27 openqa salt-master[1368]: File "/usr/bin/salt-master", line 22, in <module>
Jun 17 09:46:27 openqa salt-master[1368]: salt_master()
...
Jun 17 09:46:27 openqa systemd[1]: salt-master.service: Main process exited, code=exited, status=1/FAILURE
Jun 17 09:46:27 openqa systemd[1]: salt-master.service: Failed with result 'exit-code'.
...
Jun 17 09:52:52 openqa umount[6390]: umount: /var/lib/openqa/share/factory/iso/fixed: not mounted.
Jun 17 09:52:52 openqa systemd[1]: assets-factory-hdd-fixed.mount: Deactivated successfully.
Jun 17 09:52:52 openqa systemd[1]: Unmounted /assets/factory/hdd/fixed.
Jun 17 09:52:52 openqa systemd[1]: assets-factory-iso-fixed.mount: Deactivated successfully.
Jun 17 09:52:52 openqa systemd[1]: Unmounted /assets/factory/iso/fixed.
Jun 17 09:52:52 openqa systemd[1]: var-lib-openqa-share-factory-hdd-fixed.mount: Deactivated successfully.
Jun 17 09:52:52 openqa systemd[1]: Unmounted /var/lib/openqa/share/factory/hdd/fixed.
Jun 17 09:52:52 openqa systemd[1]: var-lib-openqa-share-factory-iso-fixed.mount: Mount process exited, code=exited, status=32/n/a
Jun 17 09:52:52 openqa systemd[1]: var-lib-openqa-share-factory-iso-fixed.mount: Failed with result 'exit-code'.
Jun 17 09:52:52 openqa systemd[1]: Unmounted /var/lib/openqa/share/factory/iso/fixed.
Jun 17 09:52:52 openqa systemd[1]: var-lib-pgsql.mount: Deactivated successfully.
Jun 17 09:52:52 openqa systemd[1]: Unmounted /var/lib/pgsql.
Jun 17 09:52:52 openqa systemd[1]: var-lib-openqa-share-factory-hdd-fixed.automount: Got hangup/error on autofs pipe from kernel. Likely our automount point has been unmounted by someone or something else?
Jun 17 09:52:52 openqa systemd[1]: var-lib-openqa-share-factory-hdd-fixed.automount: Failed with result 'unmounted'.
Jun 17 09:52:52 openqa systemd[1]: Unset automount var-lib-openqa-share-factory-hdd-fixed.automount.
I guess this is also the run where the 2 problematic mount points were disabled from the fstab so openQA started without the proper mounts being present (and created some further problems which later prevented the proper mount).
If the fsck is forced during the boot the fsck.xfs
just invokes xfs_repair
on the device:
if $FORCE; then
xfs_repair -e $DEV
repair2fsck_code $?
exit $?
fi
Normally, the implicit memory limit shouldn't be a problem (see man page)
-m maxmem Specifies the approximate maximum amount of memory, in megabytes, to use for xfs_repair. xfs_repair has its own internal block cache which will scale out up to the lesser of the process's virtual address limit or about 75% of the system's physical RAM. This option overrides these limits. NOTE: These memory limits are only approximate and may use more than the specified limit.
However, the forced checked during the boot can all run in parallel (as documented in man)
The sixth field (fs_passno).
This field is used by fsck(8) to determine the order in which filesystem checks are done at boot time. The root filesystem should be specified with a fs_passno of 1. Other filesystems should have a fs_passno of 2. Filesystems within a drive will be checked
sequentially, but filesystems on different drives will be checked at the same time to utilize parallelism available in the hardware. Defaults to zero (don’t check the filesystem) if not present.
Also the state in on of the OOM logs suggests there are actually two xfs_repair processes running (and if both have 75% of available memory, thing can go wrong).
Jun 16 11:42:29 openqa kernel: Tasks state (memory values in pages):
Jun 16 11:42:29 openqa kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Jun 16 11:42:29 openqa kernel: [ 456] 0 456 1978 573 53248 14 0 plymouthd
Jun 16 11:42:29 openqa kernel: [ 624] 0 624 2380 48 57344 0 -1000 dmeventd
Jun 16 11:42:29 openqa kernel: [ 634] 0 634 22930 646 225280 140 -250 systemd-journal
Jun 16 11:42:29 openqa kernel: [ 655] 0 655 15694 702 163840 0 -1000 systemd-udevd
Jun 16 11:42:29 openqa kernel: [ 708] 0 708 9550 2 139264 217 0 systemd-fsck
Jun 16 11:42:29 openqa kernel: [ 712] 0 712 2120 1 61440 63 0 fsck
Jun 16 11:42:29 openqa kernel: [ 717] 0 717 9550 1 143360 217 0 systemd-fsck
Jun 16 11:42:29 openqa kernel: [ 727] 0 727 2119 928 53248 142 0 haveged
Jun 16 11:42:29 openqa kernel: [ 728] 0 728 2120 1 57344 63 0 fsck
Jun 16 11:42:29 openqa kernel: [ 730] 0 730 6369 0 106496 204 0 fsck.xfs
Jun 16 11:42:29 openqa kernel: [ 732] 0 732 4638901 4530153 36679680 34715 0 xfs_repair
Jun 16 11:42:29 openqa kernel: [ 734] 0 734 6369 0 102400 204 0 fsck.xfs
Jun 16 11:42:29 openqa kernel: [ 737] 0 737 3114710 2958462 24522752 88914 0 xfs_repair
The recommendation seems to be simply do not run forced parallel on-boot checks for XFS and when needed do it explicitly from the emergency shell.
Updated by okurz 5 months ago
So is your assumption that someone from IT forced an XFS check for multiple XFS filesystems in parallel by creating /forcefsck files? I consider it more likely that there is some systemd magic creating that file or that the warning about rather using a command line parameter is misleading. In any case consider getting in contact with the according persons from IT and ask how they triggered XFS checks if at all and if not then coordinate when to do monitored reboot attempts.
Updated by jbaier_cz 5 months ago
Yeah it could be misleading (although according to the code it does not look like a systemd magic here), nevertheless I asked that question in SD-159799.
For my part, I ran xfs_repair from the host (I had no access from within the VM at the time.)
Could that flag the filesystems for another round of fsck during next boot?
Updated by jbaier_cz 5 months ago
Also the state in on of the OOM logs suggests there are actually two xfs_repair processes running (and if both have 75% of available memory, thing can go wrong).
I guess #162332#note-19 proves this hypothesis. The question remains why are the filesystem checks running on every boot. Could that be that they did not run successfully and the FS is marked as dirty? Is the man page about running fsck just wrong?
Updated by okurz 5 months ago
jbaier_cz wrote in #note-6:
The question remains why are the filesystem checks running on every boot. Could that be that they did not run successfully and the FS is marked as dirty?
Yes, I am pretty sure that is what happened. There could be multiple reasons why a filesystem is marked as dirty and needing checks like an unclean shutdown. I would not be concerned about why that happened but expect such situations to happen in the future. I guess what happens is this: At least two XFS filesystems trigger filesystem checks because of "2" at the EOL in /etc/fstab for all and all in parallel. Maybe it works though undocumented to use individual, different numbers to serialize the operations. Also I wonder about the "1" for /home if that shouldn't be a 2 or different number as well.
You could check in a simulated environment or maybe
qemu-system-x86_64 -m 8192 -snapshot -hda /dev/vda -hdb /dev/vdb -hdc /dev/vdc -hdd /dev/vdd -hde /dev/vde -nographic -serial mon:stdio -smp 4
or try to patch xfs_repair calls to limit memory or disable the boot-up filesystem checks
Updated by okurz 5 months ago
- Copied to action #162590: NFS mounts are stuck on OSD workers if partitions on OSD fail to come up properly on boot size:S added
Updated by jbaier_cz 5 months ago
Did some additional research about fs_passno
for numbers > 2. It is generally allowed to use higher numbers, however:
- in non-systemd systems, i.e. where util-linux is taken care of running fsck, the filesystems are indeed checked in the provided order: https://github.com/util-linux/util-linux/blob/2c308875a7fa1aaa44892c368f6b37bcfcb8879a/disk-utils/fsck.c#L1324-L1368
- for modern systemd enabled systems (our case), the passno is actually interpreted as a binary value: https://github.com/systemd/systemd/blob/8a7033ac2f3ac6d30ae7f453074a1ed47243fbf2/src/fstab-generator/fstab-generator.c#L393; the distinction between "1" and "2" is done differently (
systemd-fsck@.service
vssystemd-fsck-root.service
)
This looks like patching the /sbin/fsck.xfs
would be the easiest solution. We just need to do it in some not-so-hacky way.
Updated by jbaier_cz 5 months ago
I did some more research and I stumbled upon rather old debian bug, where a similar problem was discussed (in this case it is parallel fsck on multiple LVM volumes backed by the same disks). It looks like this problem is known and there is no real plan to solve the issue. However there are two potential ideas:
comment 81 suggests to use
fsck -A
instead of systemd servicesAnother normal (or maybe important) bug for systemd to not launch
fsck's for LVM or MD devices in parallel, wait for all other running
fscks to finish, then launch one and only one fsck for LVM or MD, etc.;
thats what fsck -A does, if systemd cannot do it better then just use
fsck -A that works correctly, don't reinvent an inferior wheel!comment 59 suggests to ignore the issue or disable the systemd services
My recommendation is to ignore this issue, or if you really see any
performance problem than disable fsck by systemd and use by hands
written script to call fsck.