tickets #173533
openkanidmd reload failure
0%
Description
Monitoring complains about failing reload after certificate renewals on kani{1,2}. It seems kanidmd takes the request but then does not process it completely, until it times out:
2024-12-02T00:17:16.781179+00:00 kani1 sudo: cert : PWD=/home/cert ; USER=root ; COMMAND=/usr/bin/systemctl try-reload-or-restart kanidmd
2024-12-02T00:17:16.782933+00:00 kani1 sudo: pam_unix(sudo:session): session opened for user root by (uid=1000)
2024-12-02T00:17:16.788565+00:00 kani1 systemd[1]: Reloading Kanidm Identity Server...
2024-12-02T00:17:16.789862+00:00 kani1 kanidmd[1223]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reloaded http tls acceptor
2024-12-02T00:17:16.789891+00:00 kani1 kanidmd[1223]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reloaded ldap tls acceptor
...
2024-12-02T00:18:46.889486+00:00 kani1 systemd[1]: kanidmd.service: Reload operation timed out. Killing reload process.
2024-12-02T00:18:46.889587+00:00 kani1 systemd[1]: Reload failed for Kanidm Identity Server.
Updated by crameleon 4 months ago
It seems reload was only introduced relatively recently: https://github.com/kanidm/kanidm/pull/3144. That'd explain why the issue was not present earlier, as our try-reload-or-restart falls back to restart if a unit does not support reload.
Updated by firstyear 4 months ago
When you get those messages it means the reload is complete.
This will be a bug in systemd I expect.
https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html#RELOADING=1
https://github.com/kanidm/kanidm/blob/master/server/daemon/src/main.rs#L783
Updated by crameleon 3 months ago
Thanks; with the new version we get the information that kanidmd finished reloading, but systemd is not having any of it. :-/
Jan 11 16:40:26 kani2 systemd[1]: Reloading Kanidm Identity Server...
...
Jan 11 16:40:31 kani2 kanidmd[1231]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reload complete
...
Jan 11 16:41:56 kani2 systemd[1]: kanidmd.service: Reload operation timed out. Killing reload process.
Jan 11 16:41:56 kani2 systemd[1]: Reload failed for Kanidm Identity Server.
Updated by crameleon 3 months ago
I am able to reproduce this in my test environment as well, and added some more debugging:
- let _ = sd_notify::notify(true, &[sd_notify::NotifyState::Ready]);
+ let _ = sd_notify::notify(false, &[
+ sd_notify::NotifyState::Ready,
+ sd_notify::NotifyState::Status("Kanidm is operational")
+ ],
+ );
My STATUS message gets handled by systemd just fine (my string appears in the status output of the service), but the READY signal is not.
The RELOADING signal might need to be combined with MONOTONIC_USEC? Unfortunately this seems not supported by the sd_notify crate.
Updated by crameleon 3 months ago
I propose something like this:
https://github.com/kanidm/kanidm/pull/3355
Jan 11 18:42:31 localhost systemd[1]: Reloading Kanidm Identity Server...
Jan 11 18:42:31 localhost kanidmd[23643]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reloaded http tls acceptor
Jan 11 18:42:36 localhost kanidmd[23643]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reload complete
Jan 11 18:42:36 localhost systemd[1]: Reloaded Kanidm Identity Server.
Updated by crameleon 3 days ago
Hi @firstyear, I am a bit confused, the servers have 1.5.0 installed, but still have this issue. Any ideas?
Updated by firstyear 3 days ago
At this point, I think an issue needs to be raised against systemd - we are following all their documentation to the letter about sd-notify, and they are still killing us. All of the fixes are in 1.5.0 and so now it's not my bug.
Until then, you need to just call restart instead.
Updated by crameleon 3 days ago
It's odd the status text is not shown either in systemctl status kanidmd
, but I am pretty sure it did in my 15.6 test machine back when I made that patch.
Well then, for now I put Type=notify
in the unit overrides so the reload action is disabled, and our automated try-reload-or-restart will always restart.