Project

General

Profile

Actions

action #114802

closed

Handle "QA network infrastructure Package loss alert" introduced by #113746 size:M

Added by mkittler almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2022-07-28
Due date:
2022-10-12
% Done:

0%

Estimated time:

Description

The alerts introduced by #113746 are alerting as not all hosts mentioned in that ticket's description are actually pingable.

Acceptance criteria

  • AC1: All package/packet are unpaused again and not alerting as problematic hosts are either recovered or ignored after all.

Suggestions

  • Check whether problematic hosts should be online or offline. If they should be online, try recovering them. If they should be offline, remove them from the list of checked hosts.
  • At this time, there's actually only one problematic host (s390zp14.suse.de). The alert is only firing multiple times because it is fired for each worker that cannot reach that host.
  • To check the problematic hosts, just check the panel of one of the package/packet loss alerts.

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #113746: monitoring: The grafana "ping time" panel does not list all hosts size:SResolvedtinita2022-07-182022-08-09

Actions
Actions

Also available in: Atom PDF