Project

General

Profile

Actions

action #168214

closed

[alert] Average Ping time (ms) alert size:S

Added by tinita about 1 month ago. Updated 15 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-10-14
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

Firing:
Date: Sun, 13 Oct 2024 05:39:36 +0200
https://stats.openqa-monitor.qa.suse.de/alerting/grafana/Fm02cmf4z/view?orgId=1

Resolved:
Date: Sun, 13 Oct 2024 05:44:36 +0200

Suggestions

  • Check if you can ping the machine (network connection within the infrastructure might be disrupted)
  • Login over ssh if possible, otherwise use a management interface, e.g. IPMI (machine could be stuck in boot process)
Actions #1

Updated by okurz 29 days ago

  • Assignee set to jbaier_cz
Actions #2

Updated by mkittler 29 days ago

  • Subject changed from [alert] Average Ping time (ms) alert to [alert] Average Ping time (ms) alert size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by jbaier_cz 29 days ago

  • Status changed from Workable to In Progress
Actions #4

Updated by jbaier_cz 29 days ago

  • Status changed from In Progress to Feedback

The host in question looks perfectly OK, I was not able to identify the problem. On the other hand, the average for this host is higher than for the others almost all the time. That might hint some problem with the machine itself (it is an old machine) or maybe with the network cable. We should keep this in mind and if the problem reoccurs either accept it and adjust the limit or investigate further on the hardware level.

Actions #5

Updated by okurz 29 days ago

jbaier_cz wrote in #note-4:

The host in question looks perfectly OK, I was not able to identify the problem. On the other hand, the average for this host is higher than for the others almost all the time. That might hint some problem with the machine itself (it is an old machine) or maybe with the network cable. We should keep this in mind and if the problem reoccurs either accept it and adjust the limit or investigate further on the hardware level.

First of all: Which "host in question" would that be? And second if there is a high ping then maybe the switch port itself shows unusual high traffic? You can take this as an opportunity to remind IT about https://jira.suse.com/browse/ENGINFRA-1893 related to #133700 :)

Actions #6

Updated by jbaier_cz 29 days ago

Which "host in question" would that be?

Ah, right. I forgot to mention that. In this case, the alerting was about backup-qam: https://racktables.suse.de/index.php?page=object&tab=default&object_id=9264

You can take this as an opportunity to remind IT about https://jira.suse.com/browse/ENGINFRA-1893 related to #133700 :)

Looks like #133700 could indeed help at least a little.

Actions #7

Updated by livdywan 22 days ago

jbaier_cz wrote in #note-6:

Which "host in question" would that be?

Ah, right. I forgot to mention that. In this case, the alerting was about backup-qam: https://racktables.suse.de/index.php?page=object&tab=default&object_id=9264

You can take this as an opportunity to remind IT about https://jira.suse.com/browse/ENGINFRA-1893 related to #133700 :)

Looks like #133700 could indeed help at least a little.

Are you suggesting we block on #133700? Keep in mind that ticket's been blocked for a while already.

Otherwise I'd say wrap it up and adjust the limits if it comes back.

Actions #8

Updated by jbaier_cz 15 days ago

  • Status changed from Feedback to Resolved

No reason to keep this open any longer

Actions

Also available in: Atom PDF