Actions
action #131276
closedSUSE Summer 2023 - AC failure in NUE1-SRV1
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-06-22
Due date:
% Done:
0%
Estimated time:
Description
Motivation¶
We can help with AC failure by shutting down some of our machines in NUE1-SRV1.
Acceptance Criteria¶
- AC1: As many non-essential machines as needed to keep temperature sane are powered off while still ensuring basic services
- AC2: All LSG QE machines in SRV1 are up again after AC issue resolved
Suggestions¶
- Review all LSG QE machines in NUE1-SRV1 and power off where possible, e.g. w11+12 for sure, also more where redundant
- Communicate the impact to users, e.g. in Fctry IRC and #eng-testing
Rollback steps¶
ipmi-openqaworker11-ipmi.qe-ipmi-ur power on
ipmi-openqaworker12-ipmi.qe-ipmi-ur power on
ssh osd "sudo salt-key -y -a worker11.oqa.suse.de && sudo salt-key -y -a worker12.oqa.suse.de"
Updated by okurz over 1 year ago
- Description updated (diff)
- Status changed from New to In Progress
- Assignee set to okurz
ipmi-openqaworker11-ipmi.qe-ipmi-ur power off
ipmi-openqaworker12-ipmi.qe-ipmi-ur power off
ssh osd "sudo salt-key -y -d worker11.oqa.suse.de && sudo salt-key -y -d worker12.oqa.suse.de"
Updated by openqa_review over 1 year ago
- Due date set to 2023-07-07
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 1 year ago
- Status changed from In Progress to Feedback
temp looks ok for now. Will check again next week
Updated by okurz over 1 year ago
- Due date deleted (
2023-07-07) - Status changed from Feedback to Resolved
There was a resolution message that the situation has improved and AC is fixed. I will keep w11+12 down as we have separate tickets to bring them up and use them more efficiently again.
Actions