Project

General

Profile

Actions

tickets #173533

open

kanidmd reload failure

Added by crameleon 4 months ago. Updated 3 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
FreeIPA/Kanidm
Target version:
-
Start date:
2024-12-02
Due date:
% Done:

0%

Estimated time:

Description

Monitoring complains about failing reload after certificate renewals on kani{1,2}. It seems kanidmd takes the request but then does not process it completely, until it times out:

2024-12-02T00:17:16.781179+00:00 kani1 sudo:     cert : PWD=/home/cert ; USER=root ; COMMAND=/usr/bin/systemctl try-reload-or-restart kanidmd
2024-12-02T00:17:16.782933+00:00 kani1 sudo: pam_unix(sudo:session): session opened for user root by (uid=1000)
2024-12-02T00:17:16.788565+00:00 kani1 systemd[1]: Reloading Kanidm Identity Server...
2024-12-02T00:17:16.789862+00:00 kani1 kanidmd[1223]: 00000000-0000-0000-0000-000000000000 INFO     i [info]: Reloaded http tls acceptor
2024-12-02T00:17:16.789891+00:00 kani1 kanidmd[1223]: 00000000-0000-0000-0000-000000000000 INFO     i [info]: Reloaded ldap tls acceptor
...
2024-12-02T00:18:46.889486+00:00 kani1 systemd[1]: kanidmd.service: Reload operation timed out. Killing reload process.
2024-12-02T00:18:46.889587+00:00 kani1 systemd[1]: Reload failed for Kanidm Identity Server.
Actions #1

Updated by crameleon 4 months ago

  • Private changed from Yes to No
Actions #2

Updated by crameleon 4 months ago

It seems reload was only introduced relatively recently: https://github.com/kanidm/kanidm/pull/3144. That'd explain why the issue was not present earlier, as our try-reload-or-restart falls back to restart if a unit does not support reload.

Actions #3

Updated by crameleon 4 months ago

  • Assignee set to firstyear
Actions #4

Updated by firstyear 4 months ago

Actions #5

Updated by firstyear 4 months ago

maybe I'll have to add some sleep statements. Could be reloading too fast for systemd to cope.

Actions #6

Updated by crameleon 4 months ago

I don't think systemd cares how fast it happens, it only wants to receive the notification about the reload being done (READY=1). I do note you have this in the code, but is it ever reaching there?

Actions #7

Updated by firstyear 4 months ago

Yes it does, separate testing shows it working correctly. I suspect race condition.

Actions #8

Updated by firstyear 4 months ago

1.4.4 has a delay here to help systemd, and has extra logging to be extremely clear that we got to the reload point.

Actions #9

Updated by crameleon 3 months ago

Thanks; with the new version we get the information that kanidmd finished reloading, but systemd is not having any of it. :-/

Jan 11 16:40:26 kani2 systemd[1]: Reloading Kanidm Identity Server...
...
Jan 11 16:40:31 kani2 kanidmd[1231]: 00000000-0000-0000-0000-000000000000 INFO     i [info]: Reload complete
...
Jan 11 16:41:56 kani2 systemd[1]: kanidmd.service: Reload operation timed out. Killing reload process.
Jan 11 16:41:56 kani2 systemd[1]: Reload failed for Kanidm Identity Server.
Actions #10

Updated by crameleon 3 months ago

I am able to reproduce this in my test environment as well, and added some more debugging:

-                let _ = sd_notify::notify(true, &[sd_notify::NotifyState::Ready]);
+                let _ = sd_notify::notify(false, &[
+                    sd_notify::NotifyState::Ready,
+                    sd_notify::NotifyState::Status("Kanidm is operational")
+                    ],
+                );

My STATUS message gets handled by systemd just fine (my string appears in the status output of the service), but the READY signal is not.

The RELOADING signal might need to be combined with MONOTONIC_USEC? Unfortunately this seems not supported by the sd_notify crate.

Actions #11

Updated by crameleon 3 months ago

I propose something like this:

https://github.com/kanidm/kanidm/pull/3355

Jan 11 18:42:31 localhost systemd[1]: Reloading Kanidm Identity Server...
Jan 11 18:42:31 localhost kanidmd[23643]: 00000000-0000-0000-0000-000000000000 INFO     i [info]: Reloaded http tls acceptor
Jan 11 18:42:36 localhost kanidmd[23643]: 00000000-0000-0000-0000-000000000000 INFO     i [info]: Reload complete
Jan 11 18:42:36 localhost systemd[1]: Reloaded Kanidm Identity Server.
Actions #12

Updated by crameleon 3 months ago

(The patch is a bit bigger than I'd like, please see my comments in the PR)

Actions #13

Updated by crameleon 3 days ago

Hi @firstyear, I am a bit confused, the servers have 1.5.0 installed, but still have this issue. Any ideas?

Actions #14

Updated by firstyear 3 days ago

At this point, I think an issue needs to be raised against systemd - we are following all their documentation to the letter about sd-notify, and they are still killing us. All of the fixes are in 1.5.0 and so now it's not my bug.

Until then, you need to just call restart instead.

https://bugzilla.opensuse.org/show_bug.cgi?id=1240346

Actions #15

Updated by crameleon 3 days ago

It's odd the status text is not shown either in systemctl status kanidmd, but I am pretty sure it did in my 15.6 test machine back when I made that patch.
Well then, for now I put Type=notify in the unit overrides so the reload action is disabled, and our automated try-reload-or-restart will always restart.

Actions

Also available in: Atom PDF