Add OSD PowerPC workers to automatic recovery we already have for ARM workers
There workers are often failing similarly to the ARM workers¹ and at this point we should not need to manually recover them so frequently.
- Note that for these workers a
power cycledoes not always work but
power resetseems to work always. So maybe that detail needs to be adjusted for PowerPC workers.
- I suppose all PowerPC workers controllable via IPMI should be considered (see
workerconf.slsin salt pillars).
¹ They just randomly crash and logs just end without further clues, e.g. #114565#note-40. In addition, they sometimes also get stuck at boot.
Updated by mkittler over 1 year ago
Can you provide a little bit more context regarding "failing similarly"?
There's not much to say about it. They just randomly crash and the journal doesn't give one any clues; it just ends at some point. In addition, they sometimes also get stuck at boot.
Didn't you also have a bug report and there were suggestions regarding kdump and such?
Yes. I can link the relevant progress ticket for additional context. However, I'm not sure whether we can fix this problem anytime soon.