Project

General

Profile

action #157753

Updated by okurz 8 months ago

## Motivation 
 In #132614 openqaworker-arm-1 was moved to FC Basement so that we have one hot-redundant aarch64 OSD machine outside of PRG2. For that to be setup we need to also accomodate the automatic recovery feature. 

 ## Acceptance criteria 
 * **AC1:** The automatic recovery of openqaworker-arm-1 on crashes works 
 * **AC2:** openqaworker-arm-1 runs OSD production jobs in a stable way 

 ## Suggestions 
 * Read #133748 about notes regarding PDU auto-control 
 * Find on https://wiki.suse.net/index.php/SUSE-Quality_Assurance/Labs how the new PDU can be used 
 * Integrate the new PDU in https://gitlab.suse.de/openqa/grafana-webhook-actions 
 * After openqaworker-arm-1 is fully back including recovery remove silences in https://monitor.qa.suse.de/alerting/silences 
 * Remove the "Mute All times" in https://monitor.qa.suse.de/alerting/routes for `__contacts__ =~ .*"Trigger reboot of openqaworker-arm-1".*` 

 ## Rollback actions 
 * Bring back openqaworker-arm-1 into production https://progress.opensuse.org/projects/openqav3/wiki/#Bring-back-machines-into-salt-controlled-production

Back