Project

General

Profile

Actions

action #128561

closed

salt managed host being down does not trigger any alert (was: jenkins.qa.suse.de stuck in emergency mode but no alert) size:M

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-05-03
Due date:
2023-07-04
% Done:

0%

Estimated time:

Description

Observation

In https://suse.slack.com/archives/C02CANHLANP/p1683041198918179 DimStar noted:

(Dominique Leuenberger) @Oliver Kurz Hi; seems jenkins is no longer scheduling QA runs for GNOME:Next - https://openqa.opensuse.org/group_overview/35 lists last run 3 days ago
(Fabian Vogt) Time to migrate to obs_rsync? Probably. The jenkins host is unreachable. Either down or overloaded.

There is a monitoring panel "host up" with an according alert. Looking at
https://monitor.qa.suse.de/d/GDjenkins/dashboard-for-jenkins?orgId=1&viewPanel=65105&from=1682650896818&to=1683189154032&editPanel=65105
one can see that there was a long window with no response but no alert. Likely we went a bit too far to ignore all "no data" conditions but the panel should not look at the response time but the "result_code" of the ping which always has a valid value and can be checked for host responses

Acceptance criteria

  • AC1: There is an alert when the machine is not fully up
  • AC2: There is no alert for usual planned reboots

Suggestions

  • Change "host up" to look at "result_code" and pick a sensible alert linked to that
  • Check if we do not already have a ticket for the same

Related issues 3 (0 open3 closed)

Related to openQA Infrastructure (public) - action #130132: jenkins.qa.suse.de seems downResolvedlivdywan2023-05-31

Actions
Has duplicate openQA Infrastructure (public) - action #131303: [alert] Packet loss between worker hosts and other hosts (tumblesle.qa.suse.de)Resolveddheidler2023-06-23

Actions
Copied to openQA Infrastructure (public) - action #130633: Better documentation on jenkins.qa.suse.de alerts and recoveryResolvedokurz

Actions
Actions

Also available in: Atom PDF