Project

General

Profile

Actions

action #131276

closed

SUSE Summer 2023 - AC failure in NUE1-SRV1

Added by okurz 12 months ago. Updated 12 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-06-22
Due date:
% Done:

0%

Estimated time:

Description

Motivation

We can help with AC failure by shutting down some of our machines in NUE1-SRV1.

Acceptance Criteria

  • AC1: As many non-essential machines as needed to keep temperature sane are powered off while still ensuring basic services
  • AC2: All LSG QE machines in SRV1 are up again after AC issue resolved

Suggestions

  • Review all LSG QE machines in NUE1-SRV1 and power off where possible, e.g. w11+12 for sure, also more where redundant
  • Communicate the impact to users, e.g. in Fctry IRC and #eng-testing

Rollback steps

ipmi-openqaworker11-ipmi.qe-ipmi-ur power on
ipmi-openqaworker12-ipmi.qe-ipmi-ur power on
ssh osd "sudo salt-key -y -a worker11.oqa.suse.de && sudo salt-key -y -a worker12.oqa.suse.de"
Actions #1

Updated by okurz 12 months ago

  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to okurz

Monitoring https://bs-monitor.nue.suse.com:3000/d/paTR0FXnz/temperature-and-humidity-in-nuremberg-server-rooms?orgId=1&from=now-24h&to=now

ipmi-openqaworker11-ipmi.qe-ipmi-ur power off
ipmi-openqaworker12-ipmi.qe-ipmi-ur power off
ssh osd "sudo salt-key -y -d worker11.oqa.suse.de && sudo salt-key -y -d worker12.oqa.suse.de"
Actions #2

Updated by openqa_review 12 months ago

  • Due date set to 2023-07-07

Setting due date based on mean cycle time of SUSE QE Tools

Actions #3

Updated by okurz 12 months ago

  • Status changed from In Progress to Feedback

temp looks ok for now. Will check again next week

Actions #4

Updated by okurz 12 months ago

  • Due date deleted (2023-07-07)
  • Status changed from Feedback to Resolved

There was a resolution message that the situation has improved and AC is fixed. I will keep w11+12 down as we have separate tickets to bring them up and use them more efficiently again.

Actions

Also available in: Atom PDF