openqaworker-arm-3 is down since 2020-03-16, also IPMI unresponsive
https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&from=now-7d&to=now shows that openqaworker-arm-3 is unresponsive since 2020-03-16 , also no response over IPMI
#1 Updated by okurz almost 2 years ago
- Status changed from In Progress to Blocked
Created https://infra.nue.suse.com/SelfService/Display.html?id=166330 with text
openqaworker-arm-3 is unresponsive since 2020-03-16 , also no response over IPMI, similar as in https://infra.nue.suse.com/SelfService/Display.html?id=164419 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 previously. Please recover, e.g. reset IPMI and machine but please *also* look into why the management interfaces breaks down repeatedly. Reference: https://progress.opensuse.org/issues/64737
#6 Updated by okurz almost 2 years ago
- Status changed from Blocked to In Progress
jdsn helped to recover the machine by rebooting. He wrote
The machine * did not react on ipmi * did not show any output on the vga port * did not react on key presses on a USB keyboard
I asked if he can help to find out why the management interface repeatedly disappears.
I accepted the salt key with
salt-key -A on osd and I am upgrading with
zypper dup and applying the high state with
salt -l error --no-color -C '*arm-3*' state.apply.
#7 Updated by okurz almost 2 years ago
- Status changed from In Progress to Resolved
jdsn recommends to look into logs of the management interface and sees himself not able to debug vendor problems which is understandable :)
I have enabled sending emails on alerts on https://openqaworker-arm-3-ipmi.suse.de/index.html same as on arm-1 and arm-2.
Worker is upgraded, jobs are taken and executed, e.g. https://stackoverflow.com/a/10236104 is fine, jobs are running.