Project

General

Profile

Actions

action #131276

closed

SUSE Summer 2023 - AC failure in NUE1-SRV1

Added by okurz 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-06-22
Due date:
% Done:

0%

Estimated time:

Description

Motivation

We can help with AC failure by shutting down some of our machines in NUE1-SRV1.

Acceptance Criteria

  • AC1: As many non-essential machines as needed to keep temperature sane are powered off while still ensuring basic services
  • AC2: All LSG QE machines in SRV1 are up again after AC issue resolved

Suggestions

  • Review all LSG QE machines in NUE1-SRV1 and power off where possible, e.g. w11+12 for sure, also more where redundant
  • Communicate the impact to users, e.g. in Fctry IRC and #eng-testing

Rollback steps

ipmi-openqaworker11-ipmi.qe-ipmi-ur power on
ipmi-openqaworker12-ipmi.qe-ipmi-ur power on
ssh osd "sudo salt-key -y -a worker11.oqa.suse.de && sudo salt-key -y -a worker12.oqa.suse.de"
Actions

Also available in: Atom PDF