Project

General

Profile

action #75448

OSD deployment fails because openqaworker-arm-3.suse.de is not connected

Added by Xiaojing_liu 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2020-10-28
Due date:
% Done:

0%

Estimated time:

Description

Observation

openqaworker-arm-3.suse.de cannot be connected, so the OSD deployment failed.

openqaworker-arm-3.suse.de:
    Minion did not return. [Not connected]

Workaround

Remove openqaworker-arm-3.suse.de from salt keys by running command
salt-key -d openqaworker-arm-3.suse.de

Suggestion

when openqaworker-arm-3.suse.de is online, please add it into salt keys

History

#1 Updated by okurz 9 months ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Priority changed from Normal to High
  • Target version set to Ready

#2 Updated by okurz 9 months ago

  • Status changed from Blocked to Resolved

got the information in the EngInfra ticket that the machine was reset by mmaher. I can login over ssh and can confirm the machine is up. However it is suffering from the same network problems as other hosts, see #75016 and #75274 and #73633 about this. So I did:

sudo salt-key -a openqaworker-arm-3.suse.de
sudo salt -l error -C 'openqaworker-arm-3*' cmd.run 'echo -e "#!/bin/sh\nlogger poo#73633 workarounds\nsleep 300; systemctl --failed --no-legend | sed \"s/\s.*//\" | xargs --no-run-if-empty systemctl restart\nexit 0" > /etc/init.d/boot.local && chmod +x /etc/init.d/boot.local && systemctl enable rc-local'

However no monitoring data in https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&panelId=4&fullscreen&edit&tab=alert . I found out that the "telegraf" service on this machine was disabled. Did I do this elsewhere? Enabled the service again, waited like 30m and now both https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&refresh=1m and https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?tab=queries&orgId=1&panelId=4&fullscreen&edit show data just fine again.

Also available in: Atom PDF