Project

General

Profile

Actions

action #137519

closed

[alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount, kernel modules already removed with old kernel still running size:M

Added by tinita 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-10-06
Due date:
2023-10-21
% Done:

0%

Estimated time:
Tags:

Description

Observation

Date: Fri, 06 Oct 2023 09:16:02 +0200
Check failed systemd services on hosts with systemctl --failed. Hint: Go to parent dashboard https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services to see a list of affected hosts.

2023-10-06 10:43:00        
openqaworker1        
proc-sys-fs-binfmt_misc.mount        
1

From systemctl status proc-sys-fs-binfmt_misc.mount:

Okt 06 13:03:47 openqaworker1 systemd[1]: Mounting Arbitrary Executable File Formats File System...
Okt 06 13:03:47 openqaworker1 mount[10567]: mount: /proc/sys/fs/binfmt_misc: unknown filesystem type 'binfmt_misc'.

Suggestions

  • Log into the host and check what the state is
  • Confirm how that mount is setup and how it's meant to work
  • Pause the alert and reduce urgency (as long as nothing's burning)
  • Understand why this only happens on this worker
  • Research if there was a change in the product/kernel and lookup SUSE related bugs

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #137522: [alert] alerts about "host: sushil-linux-tw-kde" that tools team should not be notified about, e.g. Inode utilization inside the OSD infrastructure is too high size:MResolvedjbaier_cz2023-10-06

Actions
Actions #1

Updated by okurz 7 months ago

  • Target version set to Ready
Actions #2

Updated by okurz 7 months ago

  • Priority changed from Normal to Urgent
Actions #3

Updated by okurz 7 months ago

  • Description updated (diff)
  • Priority changed from Urgent to High

Added silence

Actions #4

Updated by okurz 7 months ago

  • Related to action #137522: [alert] alerts about "host: sushil-linux-tw-kde" that tools team should not be notified about, e.g. Inode utilization inside the OSD infrastructure is too high size:M added
Actions #5

Updated by okurz 7 months ago

  • Description updated (diff)
  • Priority changed from High to Urgent

sorry, wrong ticket.

Actions #6

Updated by okurz 7 months ago

  • Assignee set to nicksinger
Actions #7

Updated by livdywan 7 months ago

  • Subject changed from [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount to [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #8

Updated by nicksinger 7 months ago

some sysctl-call seems to have caused the automount to attempt to mount /proc/sys/fs/binfmt_misc:

openqaworker1:/lib/modules # systemctl status proc-sys-fs-binfmt_misc.automount
× proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point
     Loaded: loaded (/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static)
     Active: failed (Result: mount-start-limit-hit) since Fri 2023-10-06 13:03:47 CEST; 36min ago
   Triggers: ● proc-sys-fs-binfmt_misc.mount
      Where: /proc/sys/fs/binfmt_misc
       Docs: https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html
             https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems

Okt 05 09:14:41 openqaworker1 systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 3180 (sysctl)
Okt 06 08:21:36 openqaworker1 systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 1965 (sysctl)
Okt 06 08:23:26 openqaworker1 systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 2141 (sysctl)

a quick check confirms that this should be provided by a module:

openqaworker1:/lib/modules # zgrep -i binfmt_misc /proc/config.gz
CONFIG_BINFMT_MISC=m

checking /lib/modules/ shows that only modules for the newer kernel are present: 5.14.21-150400.24.88-default while uname -r shows that currently 5.14.21-150400.24.81-default is running.

According to others (https://suse.slack.com/archives/C02AJ1E568M/p1696591927454759) it seems that purge-kernels.service is responsible for cleanup which indeed ran 1 day and 3h ago while the system has an uptime of 5 days. So something triggered this (why?) and for some reason it cleaned up modules which are still needed (why?)

Actions #9

Updated by nicksinger 7 months ago

  • Status changed from Workable to In Progress
Actions #10

Updated by openqa_review 7 months ago

  • Due date set to 2023-10-21

Setting due date based on mean cycle time of SUSE QE Tools

Actions #11

Updated by nicksinger 7 months ago

  • Subject changed from [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount size:M to [alert] Failed systemd services - openqaworker1 - proc-sys-fs-binfmt_misc.mount, kernel modules already removed with old kernel still running size:M
  • Status changed from In Progress to Resolved

The machine has rebootet in the meantime (a reboot was already scheduled before the weekend when I checked) and I don't have any good idea how to debug further why this only happened on this single host. Therefore I will resolve the ticket for now and we can look it up as reference if the same happens again.

Actions

Also available in: Atom PDF