action #120675
closedopenqaworker-arm-1 not bootable via IPMI
0%
Description
Observation¶
An alert email was sent to osd-admins as per the recovery pipeline in GitLab:
The IPMI management interface for openqaworker-arm-1 is inaccessible (again).
The machine itself is also not reachable over ping.
Suggested action: Reset the machine including the management interface. Similar issues were handled in https://infra.nue.suse.com/SelfService/Update.html?id=174650 and https://infra.nue.suse.com/SelfService/Display.html?id=166330 and https://infra.nue.suse.com/SelfService/Display.html?id=164419 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same machine.
See also https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/1243753 for details.
There's also repeated emails with the subject Switched Rack PDU: Outlet #1 (openqaworker-arm-1) on.
Acceptance criteria¶
- AC1: openqaworker-arm-1 bootable and survives reboots
Suggestions¶
- Manually attempt to access the machine via IPMI
Updated by mkittler about 2 years ago
This is likely just because now the IPMI hosts have been moved to the security zone so we need to use a jump host. So this isn't really about arm-1 specifically and more about updating all places where we so far use IPMI commands.
Note that arm-1 is actually online right now.
Updated by mkittler about 2 years ago
- Project changed from openQA Project (public) to openQA Infrastructure (public)
- Category deleted (
Regressions/Crashes)
Updated by mkittler about 2 years ago
- Tracker changed from coordination to action
Updated by okurz about 2 years ago
- Project changed from openQA Infrastructure (public) to openQA Project (public)
- Status changed from New to Blocked
- Assignee set to okurz
The IPMI for openqaworker-arm-1 has not been moved to the jump host as of now but DNS resolution has a problem. Reported https://sd.suse.com/servicedesk/customer/portal/1/SD-104660
Updated by okurz about 2 years ago
- Project changed from openQA Project (public) to openQA Infrastructure (public)
Updated by okurz about 2 years ago
- Status changed from Blocked to Resolved
https://sd.suse.com/servicedesk/customer/portal/1/SD-104660 is closed. I can verify that at least for me DNS resolution seems to work properly again. Also I could establish an IPMI connection using ipmitool -I lanplus -C 3 -H openqaworker-arm-1-ipmi.suse.de -U XXX -P YYY
. The machine openqaworker-arm-1 is currently up and fine and could boot automatically.