action #137519
closed
[alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount, kernel modules already removed with old kernel still running size:M
Added by tinita about 1 year ago.
Updated about 1 year ago.
Description
Observation¶
Date: Fri, 06 Oct 2023 09:16:02 +0200
Check failed systemd services on hosts with systemctl --failed
. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.
2023-10-06 10:43:00
openqaworker1
proc-sys-fs-binfmt_misc.mount
1
From systemctl status proc-sys-fs-binfmt_misc.mount
:
Okt 06 13:03:47 openqaworker1 systemd[1]: Mounting Arbitrary Executable File Formats File System...
Okt 06 13:03:47 openqaworker1 mount[10567]: mount: /proc/sys/fs/binfmt_misc: unknown filesystem type 'binfmt_misc'.
Suggestions¶
- Log into the host and check what the state is
- Confirm how that mount is setup and how it's meant to work
- Pause the alert and reduce urgency (as long as nothing's burning)
- Understand why this only happens on this worker
- Research if there was a change in the product/kernel and lookup SUSE related bugs
- Target version set to Ready
- Priority changed from Normal to Urgent
- Description updated (diff)
- Priority changed from Urgent to High
- Related to action #137522: [alert] alerts about "host: sushil-linux-tw-kde" that tools team should not be notified about, e.g. Inode utilization inside the OSD infrastructure is too high size:M added
- Description updated (diff)
- Priority changed from High to Urgent
- Assignee set to nicksinger
- Subject changed from [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount to [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount size:M
- Description updated (diff)
- Status changed from New to Workable
some sysctl-call seems to have caused the automount to attempt to mount /proc/sys/fs/binfmt_misc:
openqaworker1:/lib/modules # systemctl status proc-sys-fs-binfmt_misc.automount
× proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point
Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static)
Active: failed (Result: mount-start-limit-hit) since Fri 2023-10-06 13:03:47 CEST; 36min ago
Triggers: ● proc-sys-fs-binfmt_misc.mount
Where: /proc/sys/fs/binfmt_misc
Docs: https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html
https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
Okt 05 09:14:41 openqaworker1 systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 3180 (sysctl)
Okt 06 08:21:36 openqaworker1 systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1965 (sysctl)
Okt 06 08:23:26 openqaworker1 systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 2141 (sysctl)
a quick check confirms that this should be provided by a module:
openqaworker1:/lib/modules # zgrep -i binfmt_misc /proc/config.gz
CONFIG_BINFMT_MISC=m
checking /lib/modules/
shows that only modules for the newer kernel are present: 5.14.21-150400.24.88-default
while uname -r
shows that currently 5.14.21-150400.24.81-default
is running.
According to others (https://suse.slack.com/archives/C02AJ1E568M/p1696591927454759) it seems that purge-kernels.service
is responsible for cleanup which indeed ran 1 day and 3h ago while the system has an uptime of 5 days. So something triggered this (why?) and for some reason it cleaned up modules which are still needed (why?)
- Status changed from Workable to In Progress
- Due date set to 2023-10-21
Setting due date based on mean cycle time of SUSE QE Tools
- Subject changed from [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount size:M to [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount, kernel modules already removed with old kernel still running size:M
- Status changed from In Progress to Resolved
The machine has rebootet in the meantime (a reboot was already scheduled before the weekend when I checked) and I don't have any good idea how to debug further why this only happened on this single host. Therefore I will resolve the ticket for now and we can look it up as reference if the same happens again.
Also available in: Atom
PDF