Actions
action #106904
closedcoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #102951: [epic] Better network performance monitoring
Monitoring for worker specific bandwidth size:M
Description
User story¶
As an openQA instance administrator I am alerted on unusual up-/download rates to/from individual workers to identify problems in our infrastructure
Acceptance criteria¶
- AC1: monitor.qa has alerts for each worker depending on cache service bandwidth
Suggestions¶
- Extend telegraf config based on the example of reading worker minion jobs to read out the bandwidth data provided in the new user story #106901 , see https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/monitoring/telegraf/telegraf-worker.conf#L29
- Add new panels in grafana based on the new data
- Define alert threshold and add alert descriptions
Actions