Actions
action #169585
closed[alert] Adapt "Packet loss between worker hosts and other hosts" alert (and potentially other related alerts) after CC-related network separation - split "required_external_networks" monitoring by source location size:S
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-11-08
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Some hosts can now no longer reach each other: https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?orgId=1&viewPanel=panel-4&from=now-6h&to=now
Mitigations¶
- The alert was silenced
Suggestions¶
- PRG2 based workers can't reach mirror.nue2.suse.org and shouldn't even care so exclude that from monitoring
- NUE2 based workers can't reach qe-jumpy.prg2.suse.org and shouldn't need to
- fix that in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L18 and where this "required_external_networks" is used (e.g. https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/monitoring/telegraf/telegraf-worker.conf?ref_type=heads#L25-43)
Updated by mkittler about 1 month ago
- Subject changed from [alert] Adapt "Packet loss between worker hosts and other hosts" alert after CC-related network separation to [alert] Adapt "Packet loss between worker hosts and other hosts" alert (and potentially other related alerts) after CC-related network separation
Updated by okurz about 1 month ago
- Subject changed from [alert] Adapt "Packet loss between worker hosts and other hosts" alert (and potentially other related alerts) after CC-related network separation to [alert] Adapt "Packet loss between worker hosts and other hosts" alert (and potentially other related alerts) after CC-related network separation - split "required_external_networks" monitoring by source location size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 1 month ago
- Copied to action #169744: [alert] Adapt "Packet loss between worker hosts and other hosts" alert (and potentially other related alerts) after CC-related network separation - fix mirror.nue2 monitoring from NUE2 size:S added
Updated by mkittler about 1 month ago
- Status changed from Workable to In Progress
- Assignee set to mkittler
Updated by mkittler about 1 month ago
- Status changed from In Progress to Feedback
Updated by mkittler about 1 month ago
- Status changed from Feedback to Resolved
Looks like this generally worked. I applied the change manually on relevant disconnected hosts. We don't have to wait resolving this until the alert has cooled off because it will keep firing due to #169744.
Actions