Project

General

Profile

Actions

action #107158

closed

[osd] failing systemd service on "storage": "systemd-udev-settle" size:M

Added by okurz about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2022-02-18
Due date:
% Done:

0%

Estimated time:

Description

Observation

# systemctl status systemd-udev-settle.service
systemd-udev-settle.service - Wait for udev To Complete Device Initialization
     Loaded: loaded (/usr/lib/systemd/system/systemd-udev-settle.service; static)
     Active: failed (Result: exit-code) since Sun 2022-02-13 03:47:38 CET; 1 week 1 day ago
       Docs: man:systemd-udev-settle.service(8)
   Main PID: 709 (code=exited, status=1/FAILURE)

Feb 13 03:45:38 storage systemd[1]: Starting Wait for udev To Complete Device Initialization...
Feb 13 03:45:39 storage udevadm[709]: systemd-udev-settle.service is deprecated. Please fix wickedd.service not to pull it in.
Feb 13 03:47:38 storage systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 03:47:38 storage systemd[1]: systemd-udev-settle.service: Failed with result 'exit-code'.
Feb 13 03:47:38 storage systemd[1]: Failed to start Wait for udev To Complete Device Initialization.

Suggestions

  • mkittler had already observed this error in the past
  • Research about the deprecation message. Maybe need to report a bug to wicked? I don't think we did any manual changes there
  • Check the service on other machines in our infrastructure
  • Maybe an interface going down and up again triggered the service
Actions #2

Updated by jbaier_cz about 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to jbaier_cz
Actions #3

Updated by jbaier_cz about 2 years ago

  • Status changed from In Progress to Resolved

The service is indeed pulled by wickedd.service:

Wants=wickedd-nanny.service wickedd-dhcp6.service wickedd-dhcp4.service wickedd-auto4.service systemd-udev-settle.service

From man page of systemd-udev-settle.service:

It is a crude way to wait until "all" hardware has been discovered.

Using this service is not recommended.

So it is the way how wickedd ensures all available network hardware was discovered. The service can be probably safely masked (which I did to remedy this issue) as it is really needed when LVM is involved (and even that might no longer be true). In case of a problem, we need to modify the wickedd.service and wait explicitly wait for correct interfaces.

However, the issue itself (failing systemd-udev-settle.service) is caused by something else.
From the journal logs:

Feb 13 03:46:40 storage systemd-udevd[705]: sr0: Worker [749] processing SEQNUM=12763 is taking a long time
Feb 13 03:47:38 storage systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 03:47:38 storage systemd[1]: systemd-udev-settle.service: Failed with result 'exit-code'.
Feb 13 03:47:38 storage systemd[1]: Failed to start Wait for udev To Complete Device Initialization.

The problem is in the /dev/sr0 device

Feb 13 03:45:40 storage kernel: scsi 19:0:0:0: CD-ROM            ATEN     Virtual CDROM    YS0J PQ: 0 ANSI: 0 CCS
Feb 13 03:45:40 storage kernel: scsi 19:0:0:0: Attached scsi generic sg13 type 5
Feb 13 03:45:40 storage kernel: sr 19:0:0:0: [sr0] scsi3-mmc drive: 40x/40x cd/rw xa/form2 cdda tray
Feb 13 03:45:40 storage kernel: cdrom: Uniform CD-ROM driver Revision: 3.20
Feb 13 03:45:40 storage kernel: sr 19:0:0:0: Attached scsi CD-ROM sr0

The problem is, that udev is also trying to probe this device, which will fail (as seen above). It can be confirmed with blkid /dev/sr0 which will hang for 5 minutes which is way beyond the default limit in udev. (The default value is 120 seconds as can be found in man page for udevadm).

There are 2 possible solutions for this:

  1. Mask the systemd-udev-settle.service unit and not call udevadm settle during the boot explicitly (the chosen solution).
  2. Edit the timeout for udev to at least 300 seconds to be able to probe /dev/sr0. This will also delay the boot for the same amount as the wickedd.service is waiting for that device.
  3. Fix / remove the misbehaving device as it is probably just an attached virtual CD-ROM drive without media.
Actions #4

Updated by okurz about 2 years ago

  • Status changed from Resolved to In Progress

I assume you set it to "Resolved" by mistake because you evaluated options but haven't stated a resolution yet. I would favor option 3 but also like openSUSE packages to improve. So maybe still report a bug to wicked?

Actions #5

Updated by jbaier_cz about 2 years ago

okurz wrote:

I assume you set it to "Resolved" by mistake because you evaluated options but haven't stated a resolution yet. I would favor option 3 but also like openSUSE packages to improve. So maybe still report a bug to wicked?

No, I masked the service (which will be probably removed anyways in the future). I don't think it is the bug to depend on a deprecated feature as long as it will be fixed for next release and as far as I know, there will be no wicked by default in the next Leap so it is somewhat "resolved".

So the last question remains: Can we disable the CD-ROM drive and can we prevent it from reappearing? (I would answer that with a yes and no in this order. So it is not a permanent solution.)

The error is gone (mitigated), none of the service is currently failing, so we can definitely at least decrease the urgency; but no, it was not a mistake, I already did state a resolution for that.

Actions #6

Updated by okurz about 2 years ago

  • Status changed from In Progress to Resolved

ah, ok. Then, well, resolved. I missed the part that you actually masked the service then.

Actions #7

Updated by jbaier_cz about 2 years ago

I will try to be more explicit next time. Some of that might get lost during rephrasing my comment, I tried to provide maximum details from my investigation; I didn't want to just state "I just disabled the service" and close the ticket :)

Actions

Also available in: Atom PDF