Project

General

Profile

Actions

action #133991

open

Cover same metric for different hosts with a single alert rule

Added by nicksinger over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2023-07-20
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

In https://progress.opensuse.org/issues/133130 I explored different possibilities of grouping alerts (by hostname, by alert, etc.) and realized that unified alerting would allow us to greatly decrease our number of alert rules. Our current alert rules are automatically generated for each host by salt but could be generalized to cover every host without needing to create a new rule for them specifically.

To phrase it differently: We have n alert rule instances of the "host up" alert. One for each host. This could be reduced to one single alert by writing a query which groups by host. An example for a single alert instance covering all hosts can be found here: https://stats.openqa-monitor.qa.suse.de/alerting/grafana/b8b0597c-0aeb-4b0a-9337-6f225cd8c9d4/view

Acceptance criteria

  • AC1: A single alert rule exists which replaces all current alert rules (per host, covering the same metric)
  • AC2: The single alert conveys the same amount of information as the single alert rules do
  • AC3: All newly created alert rules are deployed via salt. Old ones are removed from salt/the templates

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #133130: Lots of alerts for a single cause. Can we group and de-duplicate?Resolvednicksinger2023-07-20

Actions
Actions #1

Updated by nicksinger over 1 year ago

  • Copied from action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added
Actions #2

Updated by nicksinger over 1 year ago

  • Copied from deleted (action #133130: Lots of alerts for a single cause. Can we group and de-duplicate?)
Actions #3

Updated by nicksinger over 1 year ago

  • Related to action #133130: Lots of alerts for a single cause. Can we group and de-duplicate? added
Actions #4

Updated by okurz over 1 year ago

  • Tags set to infra
  • Target version changed from Ready to future

Good idea although likely not right now

Actions

Also available in: Atom PDF