Project

General

Profile

action #131276

Updated by okurz 11 months ago

## Motivation 
 We can help with AC failure by shutting down some of our machines in NUE1-SRV1. 

 ## Acceptance Criteria 
 * **AC1:** As many non-essential machines as needed to keep temperature sane are powered off while still ensuring basic services 
 * **AC2:** All LSG QE machines in SRV1 are up again after AC issue resolved 

 ## Suggestions 
 * Review all LSG QE machines in NUE1-SRV1 and power off where possible, e.g. w11+12 for sure, also more where redundant 
 * Communicate the impact to users, e.g. in Fctry IRC and #eng-testing 

 ## Rollback steps 

 ``` 
 ipmi-openqaworker11-ipmi.qe-ipmi-ur power on 
 ipmi-openqaworker12-ipmi.qe-ipmi-ur power on 
 ssh osd "sudo salt-key -y -a worker11.oqa.suse.de && sudo salt-key -y -a worker12.oqa.suse.de" 
 ```

Back