https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842020-10-28T11:36:01ZopenSUSE Project Management ToolopenQA Infrastructure - action #75448: OSD deployment fails because openqaworker-arm-3.suse.de is not connectedhttps://progress.opensuse.org/issues/75448?journal_id=3443592020-10-28T11:36:01Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Blocked</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li><li><strong>Target version</strong> set to <i>Ready</i></li></ul><p>waiting for <a href="https://infra.nue.suse.com/SelfService/Display.html?id=178773" class="external">https://infra.nue.suse.com/SelfService/Display.html?id=178773</a></p>
openQA Infrastructure - action #75448: OSD deployment fails because openqaworker-arm-3.suse.de is not connectedhttps://progress.opensuse.org/issues/75448?journal_id=3445632020-10-29T09:27:29Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>Resolved</i></li></ul><p>got the information in the EngInfra ticket that the machine was reset by mmaher. I can login over ssh and can confirm the machine is up. However it is suffering from the same network problems as other hosts, see <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and... (Resolved)" href="https://progress.opensuse.org/issues/75016">#75016</a> and <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: [osd-admins][alert][learning] Failed systemd services alert (workers): os-autoinst-openvswitch.se... (Resolved)" href="https://progress.opensuse.org/issues/75274">#75274</a> and <a class="issue tracker-4 status-3 priority-6 priority-high2 closed behind-schedule" title="action: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panel... (Resolved)" href="https://progress.opensuse.org/issues/73633">#73633</a> about this. So I did:</p>
<pre><code>sudo salt-key -a openqaworker-arm-3.suse.de
sudo salt -l error -C 'openqaworker-arm-3*' cmd.run 'echo -e "#!/bin/sh\nlogger poo#73633 workarounds\nsleep 300; systemctl --failed --no-legend | sed \"s/\s.*//\" | xargs --no-run-if-empty systemctl restart\nexit 0" > /etc/init.d/boot.local && chmod +x /etc/init.d/boot.local && systemctl enable rc-local'
</code></pre>
<p>However no monitoring data in <a href="https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&panelId=4&fullscreen&edit&tab=alert" class="external">https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&panelId=4&fullscreen&edit&tab=alert</a> . I found out that the "telegraf" service on this machine was disabled. Did I do this elsewhere? Enabled the service again, waited like 30m and now both <a href="https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&refresh=1m" class="external">https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&refresh=1m</a> and <a href="https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?tab=queries&orgId=1&panelId=4&fullscreen&edit" class="external">https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?tab=queries&orgId=1&panelId=4&fullscreen&edit</a> show data just fine again.</p>