action #133991
openCover same metric for different hosts with a single alert rule
0%
Description
Observation¶
In https://progress.opensuse.org/issues/133130 I explored different possibilities of grouping alerts (by hostname, by alert, etc.) and realized that unified alerting would allow us to greatly decrease our number of alert rules. Our current alert rules are automatically generated for each host by salt but could be generalized to cover every host without needing to create a new rule for them specifically.
To phrase it differently: We have n alert rule instances of the "host up" alert. One for each host. This could be reduced to one single alert by writing a query which groups by host. An example for a single alert instance covering all hosts can be found here: https://stats.openqa-monitor.qa.suse.de/alerting/grafana/b8b0597c-0aeb-4b0a-9337-6f225cd8c9d4/view
Acceptance criteria¶
- AC1: A single alert rule exists which replaces all current alert rules (per host, covering the same metric)
- AC2: The single alert conveys the same amount of information as the single alert rules do
- AC3: All newly created alert rules are deployed via salt. Old ones are removed from salt/the templates
Suggestions¶
- Check an example created by nsinger: https://stats.openqa-monitor.qa.suse.de/alerting/grafana/b8b0597c-0aeb-4b0a-9337-6f225cd8c9d4/view
- Read Grafanas documentation regarding templating alert messages: https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rules/message-templating/
- Test with a manually created alert and a limited amount of recipients