action #137600
closed[alert] Packet loss between worker hosts and other hosts size:S
0%
Description
Observation¶
We had multiple occurrences of packet loss alert over the weekend
alertname Packet loss between worker hosts and other hosts alert
grafana_folder Salt
rule_uid 2Z025iB4km
http://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk?orgId=1&viewPanel=4
Currently, the problematic ones according to the panel are:
imagetester - walter1.qe.nue2.suse.org 100%
petrol-1 - walter1.qe.nue2.suse.org 100%
sapworker1 - walter1.qe.nue2.suse.org 100%
That is a little bit weird as I manually checked the first one and it can reach each other well
walter1:~ # ping imagetester.qe.nue2.suse.org
PING imagetester.qe.nue2.suse.org (10.168.192.249) 56(84) bytes of data.
64 bytes from imagetester.qe.nue2.suse.org (10.168.192.249): icmp_seq=7 ttl=64 time=0.326 ms
jbaier@imagetester:~> ping walter1.qe.nue2.suse.org
PING walter1.qe.nue2.suse.org (10.168.192.1) 56(84) bytes of data.
64 bytes from walter1.qe.nue2.suse.org (10.168.192.1): icmp_seq=1 ttl=64 time=0.331 ms
Suggestions¶
- Confirm when this started happening or if it's no longer an issue
- There's no paused alerts
Updated by livdywan about 1 year ago
- Subject changed from [alert] Packet loss between worker hosts and other hosts to [alert] Packet loss between worker hosts and other hosts size:S
- Description updated (diff)
- Status changed from New to Feedback
- Assignee set to livdywan
Maybe it's already fine as-is. I'll monitor this a bit.
Updated by okurz about 1 year ago
- Status changed from Feedback to Workable
nope, not fine, see https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?orgId=1&viewPanel=4&from=1697037757451&to=1697053106647 diesel<->walter
Updated by livdywan about 1 year ago
- Assignee deleted (
livdywan)
okurz wrote in #note-3:
nope, not fine, see https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?orgId=1&viewPanel=4&from=1697037757451&to=1697053106647 diesel<->walter
Thank you for taking over, that was the point of my taking the ticket ;-)
Updated by livdywan about 1 year ago
- Related to action #138044: Grouped seemingly unrelated alert emails are confusing size:M added
Updated by livdywan about 1 year ago
- Related to action #138005: grafana panel "Packet loss between worker hosts and other hosts" shows more than just ping to "other hosts" and hence becomes slow and triggers redundant alerts size:M added
Updated by livdywan about 1 year ago
Dominik and I were trying to investigate the current packet loss alert... or should I say alerts?
diesel-1 - walter1.qe.nue2.suse.org
100%
imagetester - walter1.qe.nue2.suse.org
100%
openqa - ada.qe.suse.de
100%
openqaworker1 - walter1.qe.nue2.suse.org
100%
petrol-1 - walter1.qe.nue2.suse.org
100%
sapworker2 - walter1.qe.nue2.suse.org
100%
This is what the graph from the specific point in time was showing. We couldn't figure out what hosts may have been become problematic and what was before that, so without trying to be dramatic I'm not finding this alert/graph very actionable in its current form.
Updated by okurz about 1 year ago
- Related to action #138038: diesel+petrol missing network, IPMI still reachable added
Updated by okurz about 1 year ago
- Status changed from Workable to Blocked
- Assignee set to okurz
- Priority changed from High to Low
- Target version changed from Ready to Tools - Next
Updated by okurz 9 months ago
- Status changed from Blocked to Resolved
- Target version changed from Tools - Next to Ready
All referenced three tickets resolved, https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?orgId=1&viewPanel=4 looks green, no related alert silences left.