action #116473
openAdd OSD PowerPC workers to automatic recovery we already have for ARM workers
0%
Description
There workers are often failing similarly to the ARM workers¹ and at this point we should not need to manually recover them so frequently.
Suggestions¶
- Note that for these workers a
power cycle
does not always work butpower reset
seems to work always. So maybe that detail needs to be adjusted for PowerPC workers. - I suppose all PowerPC workers controllable via IPMI should be considered (see
workerconf.sls
in salt pillars).
¹ They just randomly crash and logs just end without further clues, e.g. #114565#note-40. In addition, they sometimes also get stuck at boot.
Updated by okurz over 2 years ago
- Target version set to future
Can you provide a little bit more context regarding "failing similarly"? Didn't you also have a bug report and there were suggestions regarding kdump and such?
Updated by mkittler over 2 years ago
Can you provide a little bit more context regarding "failing similarly"?
There's not much to say about it. They just randomly crash and the journal doesn't give one any clues; it just ends at some point. In addition, they sometimes also get stuck at boot.
Didn't you also have a bug report and there were suggestions regarding kdump and such?
Yes. I can link the relevant progress ticket for additional context. However, I'm not sure whether we can fix this problem anytime soon.
Updated by mkittler over 2 years ago
- Related to action #114565: recover qa-power8-4+qa-power8-5 size:M added
Updated by mkittler over 2 years ago
- Related to action #116437: Recover qa-power8-5 size:M added