action #103539
closedUpdate expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M
0%
Description
Observation¶
Same as for OSD done in #103149 we need to update SSL certificates on all the domains we have linked to monitor.qa.suse.de (4 domains). See https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/615
Acceptance criteria¶
AC1: monitor.qa.suse.de has a valid certificateAC2: stats.openqa-monitor.qa.suse.de has a valid certificate- AC3: there is monitoring for the certificate, same as we have for osd
AC4: The certificates are automatically refreshed
Suggestions¶
- Take a look what we do in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/615 and generalize to be used for other hosts and apply the same for monitor.qa.suse.de
- Ensure the approach works for all four domains monitor.qa.suse.de, openqa-monitor.qa.suse.de, stats.openqa-monitor.qa.suse.de, stats.monitor.qa.suse.de
Files
Updated by livdywan almost 3 years ago
- Subject changed from Update SSL certificates on monitor.qa.suse.de with dehydrated and salt, same as on OSD to Update expired SSL certificate on monitor.qa.suse.de with dehydrated and salt, same as on OSD size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by nicksinger almost 3 years ago
- Status changed from Workable to In Progress
Updated by nicksinger almost 3 years ago
- Status changed from In Progress to Feedback
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/617
Changes are ready to be merged. For testing I already run the commands on OSD so even before anyone merges this we already have a valid certificate on https://monitor.qa.suse.de again :)
Updated by okurz almost 3 years ago
- Related to action #103527: osd-deployment pipelines fail and alerts are not handled size:M added
Updated by livdywan almost 3 years ago
monitor.qa.suse.de and stats.openqa-monitor.qa.suse.de seems to work - any remaining steps wrt making sure renewal happens automatically?
Updated by livdywan almost 3 years ago
- Priority changed from Urgent to Normal
Lowering priority since, as far as I can tell, the sites work without exceptions in the browser
Updated by okurz almost 3 years ago
I suggest to add monitoring, same as we already have for OSD
Updated by nicksinger almost 3 years ago
- Description updated (diff)
- Due date set to 2022-01-07
- Status changed from Feedback to Workable
Monitoring is left to do. Will carry this over into the new year. If anybody is bored feel free to take over :) Otherwise I will add the monitoring in the new year. Our current certificate is valid until the 20th of Jan so we have headroom to implement this monitoring before the current cert expires
Updated by okurz almost 3 years ago
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/634 should be a start to bring in the according telegraf config. For this we should read out the domain list from pillars but I realized that pillars have not been updated on monitor.qa since 2021-04 so that should be the first thing to check. Then in grafana copy the certificate related panel from the webui dashboard into https://monitor.qa.suse.de/d/EML0bpuGk/monitoring
Updated by livdywan almost 3 years ago
- Due date changed from 2022-01-07 to 2022-01-14
- Priority changed from Normal to High
Seems like a good thing we already set the due date prematurely. @nicksinger I recall your saying you were already looking into it. If you can, please update here at the next opportunity - or if there's something others can help you with, you're welcome to ask.
Updated by nicksinger over 2 years ago
- Status changed from Workable to In Progress
okurz wrote:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/634 should be a start to bring in the according telegraf config. For this we should read out the domain list from pillars but I realized that pillars have not been updated on monitor.qa since 2021-04 so that should be the first thing to check. Then in grafana copy the certificate related panel from the webui dashboard into https://monitor.qa.suse.de/d/EML0bpuGk/monitoring
not sure how you determined that the pillar cache was not updated that long. According to https://salt-users.narkive.com/vJmNliU0/why-must-saltutil-refresh-pillar-be-run#post2 this should happen with every highstate. I can't imagine we didn't do changes to that host in that time frame. Also checking with salt-call pillar.get "dehydrated"
I see the most recent pillar data (which according to the post on salt-users should be from the cache).
Updated by nicksinger over 2 years ago
I've opened https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/639 as a first draft how I would imagine the monitoring. It should be more generic and work for every host we eventually add in the future. I still need to figure out how to properly loop over the SANs of the hosts.txt entries per host to have a proper check for each of them (this is what happens in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/639/diffs#85e7a4e39662e846e9a7e8c1660d41d28a0389c3_0_8) - it might be even possible to shrink the certificates.conf down to a single [[inputs.x509_cert]] section
Updated by livdywan over 2 years ago
- Due date changed from 2022-01-14 to 2022-01-18
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/639 is still in progress, I'm starting to get worried that we'll see alerts for broken certificates afterall... bumping the due date to tomorrow in any case
@nicksinger Please prepare manual steps to do the update tomorrow
Updated by livdywan over 2 years ago
nicksinger wrote:
Monitoring is left to do. Will carry this over into the new year. If anybody is bored feel free to take over :) Otherwise I will add the monitoring in the new year. Our current certificate is valid until the 20th of Jan so we have headroom to implement this monitoring before the current cert expires
Apparently we're at Feb 14 now, which suggests automated renewal worked at least once.
Updated by nicksinger over 2 years ago
cdywan wrote:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/639 is still in progress, I'm starting to get worried that we'll see alerts for broken certificates afterall... bumping the due date to tomorrow in any case
@nicksinger Please prepare manual steps to do the update tomorrow
Just to make it clear here in the ticket too:
AC4: The certificates are automatically refreshed
is already done. Therefore we don't need to worry about an expiring certificate and we won't see any alerts because this is why this ticket is still open - creating monitoring/alerts for it. I polished up https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/639 and think it can be merged now to receive the metrics. A dashboard for grafana is still left to do but I'd like to cover it in a separate MR
Updated by nicksinger over 2 years ago
I came up with a new dashboard including alerts and proper instructions here: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/640
It might not work after merging because I couldn't really test the template for the dashboard.
Updated by livdywan over 2 years ago
- File grafik.png grafik.png added
- Status changed from In Progress to Feedback
nicksinger wrote:
I came up with a new dashboard including alerts and proper instructions here: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/640
It might not work after merging because I couldn't really test the template for the dashboard.
The MR got merged. Looks very nice
Updated by okurz over 2 years ago
- Due date deleted (
2022-01-18) - Status changed from Feedback to Resolved
I agree. All looks good