Actions
action #58346
closedo3 openqaworker1 and openqaworker4 are completely down on 2019-10-18
Start date:
2019-10-18
Due date:
% Done:
0%
Estimated time:
Actions
Added by okurz about 5 years ago. Updated about 5 years ago.
0%
checked responsiveness of both hosts over IPMI SOL but there is nothing. power status is on. power cycled both machines, both are up. Side-effect: The only x86_64 worker that was up is imagetester:1 and :2 and they did not seem to be very stable: https://openqa.opensuse.org/tests/1059689#next_previous shows two "random" failures in a row.
I will check if this happens again to see what I can do about debugging. I could apply the same monitor+reboot check as done for aarch64.o.o
openqaworker1 and w4 were down 2019-10-19, potentially one more time lately in the past days.
Oct 19 03:30:37 openqaworker4 systemd-journald[777]: Journal stopped
-- Reboot --
Oct 20 09:21:15 openqaworker4 kernel: microcode: microcode updated early to revision 0x43, date = 2019-03-01
after forced power cycle. I suspect a recent kernel upgrade.
Added recovery to okurz's crontab on lord.arch same as aarch64.o.o . Let's see if these trigger at all and how often.