action #64737
closedopenqaworker-arm-3 is down since 2020-03-16, also IPMI unresponsive
0%
Description
Observation¶
https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&from=now-7d&to=now shows that openqaworker-arm-3 is unresponsive since 2020-03-16 , also no response over IPMI
Updated by okurz over 3 years ago
- Status changed from In Progress to Blocked
Created https://infra.nue.suse.com/SelfService/Display.html?id=166330 with text
openqaworker-arm-3 is unresponsive since 2020-03-16 , also no response over IPMI, similar as in https://infra.nue.suse.com/SelfService/Display.html?id=164419 and https://infra.nue.suse.com/SelfService/Display.html?id=153124 previously. Please recover, e.g. reset IPMI and machine but please *also* look into why the management interfaces breaks down repeatedly.
Reference: https://progress.opensuse.org/issues/64737
Updated by okurz over 3 years ago
removed salt key for openqaworker-arm-3 for now to prevent failures when applying salt states.
Updated by okurz over 3 years ago
- Blocks action #61844: auto_review:"download failed: 521 - Connect timeout" Network issues on openqaworker-arm-3 (and others) added
Updated by okurz over 3 years ago
asked aneisner in chat if there is anything I could do for help.
Updated by okurz over 3 years ago
Unfortunately no response from aneisner. Escalated the issue with SUSE-IT repeatedly, no action so far. I asked runger for help in escalation.
Updated by okurz over 3 years ago
- Status changed from Blocked to In Progress
jdsn helped to recover the machine by rebooting. He wrote
The machine
* did not react on ipmi
* did not show any output on the vga port
* did not react on key presses on a USB keyboard
I asked if he can help to find out why the management interface repeatedly disappears.
I accepted the salt key with salt-key -A
on osd and I am upgrading with zypper dup
and applying the high state with salt -l error --no-color -C '*arm-3*' state.apply
.
Updated by okurz over 3 years ago
- Status changed from In Progress to Resolved
jdsn recommends to look into logs of the management interface and sees himself not able to debug vendor problems which is understandable :)
I have enabled sending emails on alerts on https://openqaworker-arm-3-ipmi.suse.de/index.html same as on arm-1 and arm-2.
Worker is upgraded, jobs are taken and executed, e.g. https://stackoverflow.com/a/10236104 is fine, jobs are running.