Project

General

Profile

action #120675

openqaworker-arm-1 not bootable via IPMI

Added by cdywan 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

An alert email was sent to osd-admins as per the recovery pipeline in GitLab:

The IPMI management interface for openqaworker-arm-1 is inaccessible (again).
The machine itself is also not reachable over ping.
Suggested action: Reset the machine including the management interface. Similar issues were handled in https://infra.nue.suse.com/SelfService/Update.html?id=174650 and https://infra.nue.suse.com/SelfService/Display.html?id=166330 and https://infra.nue.suse.com/SelfService/Display.html?id=164419 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 for the same machine.
See also https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/1243753 for details.

There's also repeated emails with the subject Switched Rack PDU: Outlet #1 (openqaworker-arm-1) on.

Acceptance criteria

  • AC1: openqaworker-arm-1 bootable and survives reboots

Suggestions

  • Manually attempt to access the machine via IPMI

History

#1 Updated by mkittler 3 months ago

This is likely just because now the IPMI hosts have been moved to the security zone so we need to use a jump host. So this isn't really about arm-1 specifically and more about updating all places where we so far use IPMI commands.

Note that arm-1 is actually online right now.

#2 Updated by mkittler 3 months ago

  • Project changed from openQA Project to openQA Infrastructure
  • Category deleted (Concrete Bugs)

#3 Updated by mkittler 3 months ago

  • Tracker changed from coordination to action

#4 Updated by okurz 3 months ago

  • Project changed from openQA Infrastructure to openQA Project
  • Status changed from New to Blocked
  • Assignee set to okurz

The IPMI for openqaworker-arm-1 has not been moved to the jump host as of now but DNS resolution has a problem. Reported https://sd.suse.com/servicedesk/customer/portal/1/SD-104660

#5 Updated by okurz 3 months ago

  • Project changed from openQA Project to openQA Infrastructure

#6 Updated by okurz 3 months ago

  • Status changed from Blocked to Resolved

https://sd.suse.com/servicedesk/customer/portal/1/SD-104660 is closed. I can verify that at least for me DNS resolution seems to work properly again. Also I could establish an IPMI connection using ipmitool -I lanplus -C 3 -H openqaworker-arm-1-ipmi.suse.de -U XXX -P YYY. The machine openqaworker-arm-1 is currently up and fine and could boot automatically.

Also available in: Atom PDF