Project

General

Profile

action #158020

Updated by livdywan about 2 months ago

## Observation 

 https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2425611 
 ``` 
           ID: SUSE:SLE-15-SP6:Update:BCI 
     Function: cmd.run 
         Name: su geekotest -c 'mkdir -p SUSE:SLE-15-SP6:Update:BCI && python3 script/sctimeout: sending signal TERM to command 'ssh' 
 ``` 

 https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425891 
 ``` 
           ID: stop_and_disable_all_not_configured_workers 
     Function: cmd.run 
         Name: services=$(systemctl list-units --all 'openqa-worker-auto-restart@*.service' | sed -e '/.*openqa-worker-auto-restart@.*\.service.*/!d' -e 's|.*openqa-worker-auto-restart@\(.*\)\.service.*|\1|' | awk '{ if($0 > 16) print "openqa-worker-auto-restart@" $0 ".service openqa-reload-worker-auto-restart@" $0 ".path" }' | tr '\n' ' '); [ -z "$services" ] || systemctl disable --ntimeout: sending signal TERM to command 'ssh' 
 ``` 

 ## Acceptance criteria 
 * **AC1:** openqa-piworker.qe.nue2.suse.org responsive over salt again 
 * **AC2:** The team knows how to power-cycle/recover openqa-piworker 

 ## Suggestions 
 * *DONE* Remove from production with `sudo salt-key -y -d openqa-piworker.qe.nue2.suse.org` 
 * Recover the machine with help from dheidler 
 * Add relevant remote control information in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls 
 * Add back to production 

 ## Rollback steps 
 * `ssh osd 'sudo salt-key -y -a openqa-piworker.qe.nue2.suse.org'`

Back