Actions
action #138005
closedgrafana panel "Packet loss between worker hosts and other hosts" shows more than just ping to "other hosts" and hence becomes slow and triggers redundant alerts size:M
Start date:
2023-10-14
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
https://monitor.qa.suse.de/d/EML0bpuGk/monitoring?orgId=1&viewPanel=4 should show the results from ping regarding packet loss to "other hosts", i.e. the ones that are not in our salt control, like dist.suse.de. However the panel shows a very long and slowly rendering list including multiple redundant entries that are already covered by other panels, e.g. "worker40 - openqa.suse.de" and also "openqa - tumblesle.qe.nue2.suse.org", all of which are salt controlled.
Acceptance criteria¶
- AC1: The panel should not show nor alert on any packet loss to salt controlled machines
- AC2: The panel still shows and alerts on packet loss to any "other host"
Suggestions¶
- Look into references of "inputs.ping" in https://gitlab.suse.de/openqa/salt-states-openqa , in particular in monitoring/telegraf/telegraf-worker.conf , maybe we should use a special "tag" as in 6c3e70e and bce9156 for #137522 to distinguish generic ping from this ping to "external_hosts"
- Consider changing how data is pushed by telegraf into influxdb
- Consider changing the monitoring query in https://monitor.qa.suse.de/d/EML0bpuGk/monitoring?orgId=1&viewPanel=4&editPanel=4
- Consider changing the according alert accordingly as well
- Optional: Consider changing the panel to include more than just from openQA workers
Further details¶
- "other host" means any entry of "host" in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls?ref_type=heads#L15 in "required_external_networks"
Actions