Project

General

Profile

Actions

action #126962

closed

Use templating for all provisioned "unified alerts" were the original alerts were part of templated dashboards

Added by mkittler about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-03-30
Due date:
2023-04-15
% Done:

0%

Estimated time:
Tags:

Description

The alerts that were previously part of the generic and workers dashboards have already been templated as part of #125642. Checkout https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/816/diffs to see how it was done. The same should be done for the alerts that were part of the web UI services dashboard (salt-states-openqa/monitoring/grafana/webui.services.json.template) and the certificates dashboard (salt-states-openqa/monitoring/grafana/certificates.json.template).

Acceptance criteria

  • AC1: Alerts previously¹ defined in salt-states-openqa/monitoring/grafana/webui.services.json.template are using templating again (instead of having a distinct YAML files for all those alerts).
  • AC2: Alerts previously¹ defined in salt-states-openqa/monitoring/grafana/certificates.json.template are using templating again (instead of having a distinct YAML files for all those alerts).
  • AC3: All templated alerts still have unique rule_uids (as otherwise Grafana would complain).
  • AC4: There are no duplicates. Versions of alerts under their old rule_uids have been cleaned up.
  • AC5: References to corresponding dashboards/panels via IDs still work.

¹ The alerts are actually still defined in those files but no longer used. This should supposedly be cleaned up at some point but for now you can still find them and they might be a useful reference for this task.

Actions #1

Updated by okurz about 1 year ago

  • Priority changed from Normal to High
  • Target version set to Ready

IMHO this ticket should have been part of the original work so bumping the prio so that we don't suffer from inefficient duplication for long

Actions #2

Updated by mkittler about 1 year ago

  • Status changed from New to In Progress

Ok, then I'll continue with that today.

Actions #3

Updated by mkittler about 1 year ago

  • Assignee set to mkittler
Actions #5

Updated by openqa_review about 1 year ago

  • Due date set to 2023-04-15

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by mkittler about 1 year ago

  • Status changed from In Progress to Feedback
Actions #7

Updated by mkittler about 1 year ago

  • Status changed from Feedback to In Progress
Actions #8

Updated by mkittler about 1 year ago

  • Status changed from In Progress to Resolved

We checked the database for stale entries and eventually Nick deleted problematic ones. So now it works (without further code changes). The relation between alerts and their panels is also correct (I have checked the certificate alerts and several service alerts).

I've also just removed the silence for the stale alert that is now gone as well.

So all ACs are fulfilled.

Actions

Also available in: Atom PDF