Actions
action #59382
closedopenqaworker-arm-1 is down, was automatically power cycled by grafana+gitlab, no reaction, power cycled again, SOL is unresponsive
Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2019-11-13
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/138117 was triggered after the alarm that arm-1 is down on 2019-11-11 but it seems arm-1 never came back according to https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-1/worker-dashoard-openqaworker-arm-1?orgId=1&refresh=1m&from=now-7d&to=now
Suggestions¶
We should escalate the alert when recovery fails or at least try again multiple times until we see some response, e.g. extend https://gitlab.suse.de/openqa/grafana-webhook-actions to add "sol activate" and see if there is any response and ping the host.
Actions