Project

General

Profile

Actions

action #169357

closed

monitoring+alerting for dashboard.qam.suse.de SSL certificate not deployed within expiry size:S

Added by okurz 5 months ago. Updated 19 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2024-11-05
Due date:
% Done:

0%

Estimated time:

Description

Observation

The certificate for dashboard.qam.suse.de expired on 10/30/2024. See #169078 for resolution. This is about monitoring+alerting

Acceptance criteria

  • AC1: We are alerted if the externally available certificate on dashboard.qam.suse.de is valid only for a limited amount of days

Suggestions

  • Follow #169078 for the actual fix and then focus on monitoring
  • This was not covered by any alerts but thanks to a user reporting it
    • Apparently dashboard.qam.suse.de is not monitored on SSL Certificate Alerts so far? Shouldn't it be? Of course it's not monitored by that because qem-dashboard is not OSD salt controlled but deployed by ansible playbooks. We could consider to monitor the certificate validity from outside, e.g. from another host
  • Look how we currently monitor for OSD in https://gitlab.suse.de/openqa/salt-states-openqa

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #169078: dashboard.qam.suse.de SSL certificate not deployed within expiry size:SResolvedrobert.richardson2024-11-29

Actions
Actions #1

Updated by okurz 5 months ago

  • Copied from action #169078: dashboard.qam.suse.de SSL certificate not deployed within expiry size:S added
Actions #2

Updated by okurz 28 days ago

  • Target version changed from Tools - Next to Ready
Actions #3

Updated by jbaier_cz 27 days ago

  • Assignee set to jbaier_cz
Actions #4

Updated by jbaier_cz 26 days ago

  • Status changed from Workable to In Progress
Actions #5

Updated by jbaier_cz 26 days ago

I created https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/993 and https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1400 which should enhance the current framework for certificate monitoring. I guess the dashboard still needs some tuning to show the values. I will look at that in the next steps.

Actions #6

Updated by jbaier_cz 26 days ago

And I just spotted that we are also not monitoring the new certificate for loki because how the list of panels is constructed.

Actions #7

Updated by openqa_review 26 days ago

  • Due date set to 2025-03-21

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by jbaier_cz 25 days ago

  • Status changed from In Progress to Feedback

Changes deployed. The new dashboards are already there (loki is fixed). Waiting for the first data about dashboard.qam.suse.de to arrive in the grafana to validate.

Actions #9

Updated by jbaier_cz 25 days ago

Data are there, it just turned out that the correct common name for the host is different. Fixed by https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/996

Actions #10

Updated by jbaier_cz 19 days ago

  • Due date deleted (2025-03-21)
  • Status changed from Feedback to Resolved

Dashboard is up and running.

Actions

Also available in: Atom PDF