tickets #173533
openkanidmd reload failure
0%
Description
Monitoring complains about failing reload after certificate renewals on kani{1,2}. It seems kanidmd takes the request but then does not process it completely, until it times out:
2024-12-02T00:17:16.781179+00:00 kani1 sudo: cert : PWD=/home/cert ; USER=root ; COMMAND=/usr/bin/systemctl try-reload-or-restart kanidmd
2024-12-02T00:17:16.782933+00:00 kani1 sudo: pam_unix(sudo:session): session opened for user root by (uid=1000)
2024-12-02T00:17:16.788565+00:00 kani1 systemd[1]: Reloading Kanidm Identity Server...
2024-12-02T00:17:16.789862+00:00 kani1 kanidmd[1223]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reloaded http tls acceptor
2024-12-02T00:17:16.789891+00:00 kani1 kanidmd[1223]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reloaded ldap tls acceptor
...
2024-12-02T00:18:46.889486+00:00 kani1 systemd[1]: kanidmd.service: Reload operation timed out. Killing reload process.
2024-12-02T00:18:46.889587+00:00 kani1 systemd[1]: Reload failed for Kanidm Identity Server.
Updated by crameleon about 1 month ago
It seems reload was only introduced relatively recently: https://github.com/kanidm/kanidm/pull/3144. That'd explain why the issue was not present earlier, as our try-reload-or-restart falls back to restart if a unit does not support reload.
Updated by firstyear about 1 month ago
When you get those messages it means the reload is complete.
This will be a bug in systemd I expect.
https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html#RELOADING=1
https://github.com/kanidm/kanidm/blob/master/server/daemon/src/main.rs#L783
Updated by firstyear about 1 month ago
maybe I'll have to add some sleep statements. Could be reloading too fast for systemd to cope.
Updated by crameleon about 1 month ago
I don't think systemd cares how fast it happens, it only wants to receive the notification about the reload being done (READY=1). I do note you have this in the code, but is it ever reaching there?
Updated by firstyear about 1 month ago
Yes it does, separate testing shows it working correctly. I suspect race condition.
Updated by firstyear about 1 month ago
1.4.4 has a delay here to help systemd, and has extra logging to be extremely clear that we got to the reload point.
Updated by crameleon 3 days ago
Thanks; with the new version we get the information that kanidmd finished reloading, but systemd is not having any of it. :-/
Jan 11 16:40:26 kani2 systemd[1]: Reloading Kanidm Identity Server...
...
Jan 11 16:40:31 kani2 kanidmd[1231]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reload complete
...
Jan 11 16:41:56 kani2 systemd[1]: kanidmd.service: Reload operation timed out. Killing reload process.
Jan 11 16:41:56 kani2 systemd[1]: Reload failed for Kanidm Identity Server.
Updated by crameleon 3 days ago
I am able to reproduce this in my test environment as well, and added some more debugging:
- let _ = sd_notify::notify(true, &[sd_notify::NotifyState::Ready]);
+ let _ = sd_notify::notify(false, &[
+ sd_notify::NotifyState::Ready,
+ sd_notify::NotifyState::Status("Kanidm is operational")
+ ],
+ );
My STATUS message gets handled by systemd just fine (my string appears in the status output of the service), but the READY signal is not.
The RELOADING signal might need to be combined with MONOTONIC_USEC? Unfortunately this seems not supported by the sd_notify crate.
Updated by crameleon 3 days ago
I propose something like this:
https://github.com/kanidm/kanidm/pull/3355
Jan 11 18:42:31 localhost systemd[1]: Reloading Kanidm Identity Server...
Jan 11 18:42:31 localhost kanidmd[23643]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reloaded http tls acceptor
Jan 11 18:42:36 localhost kanidmd[23643]: 00000000-0000-0000-0000-000000000000 INFO i [info]: Reload complete
Jan 11 18:42:36 localhost systemd[1]: Reloaded Kanidm Identity Server.