Project

General

Profile

action #133748

Updated by okurz 10 months ago

## Motivation 
 In #132614 openqaworker-arm-1 was moved to FC Basement so that we have one hot-redundant aarch64 OSD machine outside of PRG2. For that to be setup we need to also accomodate the automatic recovery feature. 

 ## Acceptance criteria 
 * **AC1:** openqaworker-arm-1 runs OSD production jobs again 
 * **AC2:** The automatic recovery of openqaworker-arm-1 on crashes works 

 ## Suggestions 
 * Disable the automatic recovery for openqaworker-arm-1 from the old location 
 * Mount the machine and connect it back into the network including DHCP/DNS in https://gitlab.suse.de/OPS-Service/salt/ 
 * Remove old DHCP/DNS entries in https://gitlab.suse.de/OPS-Service/salt/ 
 * Update https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls 
 * Find on https://wiki.suse.net/index.php/SUSE-Quality_Assurance/Labs how the new PDU can be used 
 * Integrate the new PDU in https://gitlab.suse.de/openqa/grafana-webhook-actions 

 ## Rollback steps 
 * Add back openqaworker-arm-1 to salt on OSD 
 * after openqaworker-arm-1 is back remove silences in https://monitor.qa.suse.de/alerting/silences 
 * Remove the "Mute All times" in https://monitor.qa.suse.de/alerting/routes for `__contacts__ =~ .*"Trigger reboot of openqaworker-arm-1".*`

Back