Project

General

Profile

Actions

action #71191

closed

inform EngInfra automatically if the IPMI interfaces are not accessible

Added by okurz over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

As discussed in
https://infra.nue.suse.com/SelfService/Display.html?id=175681#txn-2552251 at least rklein is ok with us automatically creating tickets based on alerts. So what we can try is to define another notification channel in https://stats.openqa-monitor.qa.suse.de/alerting/notifications and use that in the "long-time" alerts of arm-1 through arm-3


Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #69610: ipmi management interface of openqaworker-arm-3 is inaccessibleResolvedokurz2020-07-16

Actions
Actions #1

Updated by okurz over 4 years ago

  • Copied from action #69610: ipmi management interface of openqaworker-arm-3 is inaccessible added
Actions #2

Updated by okurz over 4 years ago

  • Status changed from New to Feedback

I created a new notification channel "infra@suse.de" on https://stats.openqa-monitor.qa.suse.de/alerting/notifications with email address infra@suse.de with "Disable Resolve Message" to prevent reopening any infra ticket after the problem had been solved from EngInfra side.

Now we could add a custom message on the alert and I tried to avoid the hostname so that we can rely on the subject:

The IPMI management interface for this machine is inaccessible (again). The machine itself is also not reachable over ping.

Suggested action: Reset the machine including the management interface.

Similar issues were handled in https://infra.nue.suse.com/SelfService/Update.html?id=174650 and https://infra.nue.suse.com/SelfService/Display.html?id=166330 and https://infra.nue.suse.com/SelfService/Display.html?id=164419 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same machine 

to achieve that the ticket is prioritized accordingly I prefixed the name of the alert rule with [openqa]

created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/355

Actions #3

Updated by okurz about 4 years ago

  • Status changed from Feedback to Resolved

was merged, is effective. But so far the problem did not happen.

Actions

Also available in: Atom PDF