Actions
action #99837
opencoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert
configurable exclusion rules for /influxdb/minion
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
QA (public, currently private due to #173521) - future
Start date:
2021-10-06
Due date:
% Done:
0%
Estimated time:
Description
Motivation¶
We have minion jobs that sometimes fail and as admins we are not always interested in all types of minion job failures. So far we exclude certain jobs in OSD telegraf+grafana alerting but for the openQA route /influxdb/minion we can not do that. So I suggest to create configurable exclusion rules for the route /influxdb/minion to only present failures that admins care about
Acceptance criteria¶
- AC1: grafana panels monitoring minion job failures do not include any failed minion jobs that match the exclusion, e.g. "obs_rsync_run"
Actions