action #158020
Updated by livdywan about 2 months ago
## Observation https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2425611 ``` ID: SUSE:SLE-15-SP6:Update:BCI Function: cmd.run Name: su geekotest -c 'mkdir -p SUSE:SLE-15-SP6:Update:BCI && python3 script/sctimeout: sending signal TERM to command 'ssh' ``` https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425891 ``` ID: stop_and_disable_all_not_configured_workers Function: cmd.run Name: services=$(systemctl list-units --all 'openqa-worker-auto-restart@*.service' | sed -e '/.*openqa-worker-auto-restart@.*\.service.*/!d' -e 's|.*openqa-worker-auto-restart@\(.*\)\.service.*|\1|' | awk '{ if($0 > 16) print "openqa-worker-auto-restart@" $0 ".service openqa-reload-worker-auto-restart@" $0 ".path" }' | tr '\n' ' '); [ -z "$services" ] || systemctl disable --ntimeout: sending signal TERM to command 'ssh' ``` ## Acceptance criteria * **AC1:** openqa-piworker.qe.nue2.suse.org responsive over salt again * **AC2:** The team knows how to power-cycle/recover openqa-piworker ## Suggestions * *DONE* Remove from production with `sudo salt-key -y -d openqa-piworker.qe.nue2.suse.org` * Recover the machine with help from dheidler * Add relevant remote control information in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls * Add back to production ## Rollback steps * `ssh osd 'sudo salt-key -y -a openqa-piworker.qe.nue2.suse.org'`