action #71191
closedinform EngInfra automatically if the IPMI interfaces are not accessible
0%
Description
As discussed in
https://infra.nue.suse.com/SelfService/Display.html?id=175681#txn-2552251 at least rklein is ok with us automatically creating tickets based on alerts. So what we can try is to define another notification channel in https://stats.openqa-monitor.qa.suse.de/alerting/notifications and use that in the "long-time" alerts of arm-1 through arm-3
Updated by okurz about 4 years ago
- Copied from action #69610: ipmi management interface of openqaworker-arm-3 is inaccessible added
Updated by okurz about 4 years ago
- Status changed from New to Feedback
I created a new notification channel "infra@suse.de" on https://stats.openqa-monitor.qa.suse.de/alerting/notifications with email address infra@suse.de with "Disable Resolve Message" to prevent reopening any infra ticket after the problem had been solved from EngInfra side.
Now we could add a custom message on the alert and I tried to avoid the hostname so that we can rely on the subject:
The IPMI management interface for this machine is inaccessible (again). The machine itself is also not reachable over ping.
Suggested action: Reset the machine including the management interface.
Similar issues were handled in https://infra.nue.suse.com/SelfService/Update.html?id=174650 and https://infra.nue.suse.com/SelfService/Display.html?id=166330 and https://infra.nue.suse.com/SelfService/Display.html?id=164419 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same machine
to achieve that the ticket is prioritized accordingly I prefixed the name of the alert rule with [openqa]
created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/355
Updated by okurz about 4 years ago
- Status changed from Feedback to Resolved
was merged, is effective. But so far the problem did not happen.