Project

General

Profile

Actions

action #132812

closed

[alert] openqaw5-xen host up alert + infrastructure ping size:M

Added by okurz over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-07-16
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

https://monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1&from=now-6h&to=now showing 100% packet loss between qa-power8-4 and openqaw5-xen.

Acceptance criteria

  • AC1: Alert resolved
  • AC2: Alert about packet loss should only fire if we don't already have a related "host up" alert

Suggestions

  • Look into the individual alerts and fix the error source
  • Crosscheck definitions of "host up" and "packet loss" alerts, do we have a redundant alerting overlap? IIRC (okurz) then packet loss was intended to fire only when we have significant packet loss but not hosts being down completely
  • Ensure all rollback steps are conducted

Rollback steps

  • Remove related silences

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #132500: NUE1-SRV2, .qa.suse.de, aarch64 workers offline due to heat-related SRV2 shutdown size:MResolvednicksinger2023-07-27

Actions
Actions

Also available in: Atom PDF