Project

General

Profile

Actions

action #154627

closed

[potential-regression] Ensure that our "host up" alert alerts on not host-up conditions size:M

Added by okurz 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #150983 and #151588. Currently our "host up" alert is likely showing "no data" for currently salt-controlled hosts that are temporarily down but that needs to be crosschecked.

Acceptance criteria

  • AC1: We are alerted if a host that is currently in salt is down
  • AC2: There is only one firing alert at a time when a host that is currently in salt is down
  • AC3: There is no firing alert after reasonable time if we have removed a host from salt control, i.e. removed from salt keys on OSD and potentially re-deploy a high state

Suggestions

  • On monitor.qa.suse.de select any host, show the "host up" panel, then shut down the machine and check how the ping behaves, e.g. select tumblesle on qamaster
  • Fix the alert or if everything works fine convince everybody that made big noiz about nothing
  • Extend our documentation in salt-states repo or team wiki or openQA wiki as applicable for how to handle taking hosts down/up or something, e.g. review https://progress.opensuse.org/projects/openqav3/wiki/#Take-machines-out-of-salt-controlled-production

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure - action #151588: [potential-regression] Our salt node up check in osd-deployment never fails size:MRejectedokurz2023-11-28

Actions
Actions

Also available in: Atom PDF