Project

General

Profile

Actions

action #116473

open

Add OSD PowerPC workers to automatic recovery we already have for ARM workers

Added by mkittler over 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2022-09-12
Due date:
% Done:

0%

Estimated time:
Tags:

Description

There workers are often failing similarly to the ARM workers¹ and at this point we should not need to manually recover them so frequently.

Suggestions

  • Note that for these workers a power cycle does not always work but power reset seems to work always. So maybe that detail needs to be adjusted for PowerPC workers.
  • I suppose all PowerPC workers controllable via IPMI should be considered (see workerconf.sls in salt pillars).

¹ They just randomly crash and logs just end without further clues, e.g. #114565#note-40. In addition, they sometimes also get stuck at boot.


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #114565: recover qa-power8-4+qa-power8-5 size:MResolvedokurz2022-12-19

Actions
Related to openQA Infrastructure (public) - action #116437: Recover qa-power8-5 size:MResolvedmkittler

Actions
Actions #1

Updated by okurz over 2 years ago

  • Target version set to future

Can you provide a little bit more context regarding "failing similarly"? Didn't you also have a bug report and there were suggestions regarding kdump and such?

Actions #2

Updated by mkittler over 2 years ago

Can you provide a little bit more context regarding "failing similarly"?

There's not much to say about it. They just randomly crash and the journal doesn't give one any clues; it just ends at some point. In addition, they sometimes also get stuck at boot.

Didn't you also have a bug report and there were suggestions regarding kdump and such?

Yes. I can link the relevant progress ticket for additional context. However, I'm not sure whether we can fix this problem anytime soon.

Actions #3

Updated by mkittler over 2 years ago

  • Description updated (diff)
Actions #4

Updated by mkittler over 2 years ago

  • Related to action #114565: recover qa-power8-4+qa-power8-5 size:M added
Actions #5

Updated by mkittler over 2 years ago

Actions #6

Updated by mkittler over 2 years ago

  • Description updated (diff)
Actions #7

Updated by okurz almost 2 years ago

  • Tags set to infra
Actions

Also available in: Atom PDF