action #75448
closedOSD deployment fails because openqaworker-arm-3.suse.de is not connected
0%
Description
Observation¶
openqaworker-arm-3.suse.de cannot be connected, so the OSD deployment failed.
openqaworker-arm-3.suse.de:
Minion did not return. [Not connected]
Workaround¶
Remove openqaworker-arm-3.suse.de from salt keys by running command
salt-key -d openqaworker-arm-3.suse.de
Suggestion¶
when openqaworker-arm-3.suse.de is online, please add it into salt keys
Updated by okurz almost 4 years ago
- Status changed from New to Blocked
- Assignee set to okurz
- Priority changed from Normal to High
- Target version set to Ready
Updated by okurz almost 4 years ago
- Status changed from Blocked to Resolved
got the information in the EngInfra ticket that the machine was reset by mmaher. I can login over ssh and can confirm the machine is up. However it is suffering from the same network problems as other hosts, see #75016 and #75274 and #73633 about this. So I did:
sudo salt-key -a openqaworker-arm-3.suse.de
sudo salt -l error -C 'openqaworker-arm-3*' cmd.run 'echo -e "#!/bin/sh\nlogger poo#73633 workarounds\nsleep 300; systemctl --failed --no-legend | sed \"s/\s.*//\" | xargs --no-run-if-empty systemctl restart\nexit 0" > /etc/init.d/boot.local && chmod +x /etc/init.d/boot.local && systemctl enable rc-local'
However no monitoring data in https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&panelId=4&fullscreen&edit&tab=alert . I found out that the "telegraf" service on this machine was disabled. Did I do this elsewhere? Enabled the service again, waited like 30m and now both https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&refresh=1m and https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?tab=queries&orgId=1&panelId=4&fullscreen&edit show data just fine again.