Project

General

Profile

Actions

action #58346

closed

o3 openqaworker1 and openqaworker4 are completely down on 2019-10-18

Added by okurz about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
-
Start date:
2019-10-18
Due date:
% Done:

0%

Estimated time:

Related issues 1 (0 open1 closed)

Has duplicate openQA Infrastructure (public) - action #58403: openqaworker1 and w4 are repeatedly downRejectedokurz2019-10-202019-11-10

Actions
Actions #1

Updated by okurz about 5 years ago

  • Priority changed from Urgent to Normal

checked responsiveness of both hosts over IPMI SOL but there is nothing. power status is on. power cycled both machines, both are up. Side-effect: The only x86_64 worker that was up is imagetester:1 and :2 and they did not seem to be very stable: https://openqa.opensuse.org/tests/1059689#next_previous shows two "random" failures in a row.

Actions #2

Updated by okurz about 5 years ago

  • Due date set to 2019-11-03
  • Status changed from In Progress to Feedback
  • Priority changed from Normal to Low

I will check if this happens again to see what I can do about debugging. I could apply the same monitor+reboot check as done for aarch64.o.o

openqaworker1 and w4 were down 2019-10-19, potentially one more time lately in the past days.

Oct 19 03:30:37 openqaworker4 systemd-journald[777]: Journal stopped
-- Reboot --
Oct 20 09:21:15 openqaworker4 kernel: microcode: microcode updated early to revision 0x43, date = 2019-03-01

after forced power cycle. I suspect a recent kernel upgrade.

Actions #3

Updated by okurz about 5 years ago

  • Has duplicate action #58403: openqaworker1 and w4 are repeatedly down added
Actions #4

Updated by okurz about 5 years ago

  • Due date deleted (2019-11-03)
  • Status changed from Feedback to Resolved

Added recovery to okurz's crontab on lord.arch same as aarch64.o.o . Let's see if these trigger at all and how often.

Actions

Also available in: Atom PDF