action #165434
closedOSD SSL certificates not always refreshed within expected time, probably only after system reboots size:S
0%
Description
Observation¶
See
it is expected that the SSL certificates are refreshed as visible on the left-hand side so that the validity is always at least 3 weeks. Since around 2024-05-19 it seems that the certificates are not always refreshed as before anyway although we have never ran out of a valid certificate. Today, 2024-08-18, it seems that the certificate validity period decreased to just below the 6 days alerting threshold just before OSD rebooted which seemingly triggered a refresh.
Acceptance criteria¶
- AC1: OSD SSL certificates are ensured to refresh consistently well above the alerting threshold
Suggestions¶
- Check if the certificate on filesystem is expired OR if nginx just serves the old certificate
- Understand how the certificate management is currently handled with https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/certificates/dehydrated.sls?ref_type=heads
- Check logs of the dehydrated and related services on OSD. Maybe the certificates are refreshed but we never reload the webserver properly until a reboot?
Files
Updated by okurz 3 months ago
- Subject changed from OSD SSL certificates not always refreshed within expected time, probably only after system reboots to OSD SSL certificates not always refreshed within expected time, probably only after system reboots size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by nicksinger 3 months ago
- Status changed from Workable to Resolved
- Assignee set to nicksinger
We have a hook-script in /etc/dehydrated/postrun-hooks.d/reload-webserver.sh
which is populated based on the grain webserver
(see https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/certificates/dehydrated.sls?ref_type=heads#L37) but this grain was never changed to nginx and was still set to apache2. I manually changed /etc/salt/grains
to contain webserver: apache2
and called salt 'openqa.suse.de' saltutil.sync_grains
, verified with:
openqa:/etc/dehydrated # salt 'openqa.suse.de' grains.get webserver
openqa.suse.de:
nginx
and deployed with:
openqa:/etc/dehydrated # salt 'openqa.suse.de' state.sls_id /etc/dehydrated/postrun-hooks.d/reload-webserver.sh certificates.dehydrated
openqa.suse.de:
----------
ID: /etc/dehydrated/postrun-hooks.d/reload-webserver.sh
Function: file.managed
Result: True
Comment: File /etc/dehydrated/postrun-hooks.d/reload-webserver.sh updated
Started: 14:27:29.212692
Duration: 45.262 ms
Changes:
----------
diff:
---
+++
@@ -1,2 +1,2 @@
#!/bin/sh
-systemctl reload apache2
+systemctl reload nginx
Summary for openqa.suse.de
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 45.262 ms
Unfortunately I reloaded nginx while testing so we will have to wait another 3 weeks before we see if it worked. But our alerts will let us know.
Updated by nicksinger about 2 months ago
- Related to action #167458: openqa.oqa.prg2.suse.org SAN validity alert added
Updated by livdywan 20 days ago
- Copied to action #169078: dashboard.qam.suse.de SSL certificate not deployed within expiry size:S added