action #168214
closed[alert] Average Ping time (ms) alert size:S
0%
Description
Observation¶
Firing:
Date: Sun, 13 Oct 2024 05:39:36 +0200
https://stats.openqa-monitor.qa.suse.de/alerting/grafana/Fm02cmf4z/view?orgId=1
Resolved:
Date: Sun, 13 Oct 2024 05:44:36 +0200
Suggestions¶
- Check if you can ping the machine (network connection within the infrastructure might be disrupted)
- Login over ssh if possible, otherwise use a management interface, e.g. IPMI (machine could be stuck in boot process)
Updated by mkittler about 2 months ago
- Subject changed from [alert] Average Ping time (ms) alert to [alert] Average Ping time (ms) alert size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by jbaier_cz about 2 months ago
- Status changed from Workable to In Progress
Updated by jbaier_cz about 2 months ago
- Status changed from In Progress to Feedback
The host in question looks perfectly OK, I was not able to identify the problem. On the other hand, the average for this host is higher than for the others almost all the time. That might hint some problem with the machine itself (it is an old machine) or maybe with the network cable. We should keep this in mind and if the problem reoccurs either accept it and adjust the limit or investigate further on the hardware level.
Updated by okurz about 2 months ago
jbaier_cz wrote in #note-4:
The host in question looks perfectly OK, I was not able to identify the problem. On the other hand, the average for this host is higher than for the others almost all the time. That might hint some problem with the machine itself (it is an old machine) or maybe with the network cable. We should keep this in mind and if the problem reoccurs either accept it and adjust the limit or investigate further on the hardware level.
First of all: Which "host in question" would that be? And second if there is a high ping then maybe the switch port itself shows unusual high traffic? You can take this as an opportunity to remind IT about https://jira.suse.com/browse/ENGINFRA-1893 related to #133700 :)
Updated by jbaier_cz about 2 months ago
Which "host in question" would that be?
Ah, right. I forgot to mention that. In this case, the alerting was about backup-qam: https://racktables.suse.de/index.php?page=object&tab=default&object_id=9264
You can take this as an opportunity to remind IT about https://jira.suse.com/browse/ENGINFRA-1893 related to #133700 :)
Looks like #133700 could indeed help at least a little.
Updated by livdywan about 2 months ago
jbaier_cz wrote in #note-6:
Which "host in question" would that be?
Ah, right. I forgot to mention that. In this case, the alerting was about backup-qam: https://racktables.suse.de/index.php?page=object&tab=default&object_id=9264
You can take this as an opportunity to remind IT about https://jira.suse.com/browse/ENGINFRA-1893 related to #133700 :)
Looks like #133700 could indeed help at least a little.
Are you suggesting we block on #133700? Keep in mind that ticket's been blocked for a while already.
Otherwise I'd say wrap it up and adjust the limits if it comes back.
Updated by jbaier_cz about 2 months ago
- Status changed from Feedback to Resolved
No reason to keep this open any longer