Actions
action #71590
open[osd][alert] Implement proper monitoring of needed resources of workers
Start date:
2020-09-21
Due date:
% Done:
0%
Estimated time:
Description
We just had a case that all powerVM jobs where failing to boot (result: failed) because the VIOS of one of our powerVM hosts was down.
So while we monitor the worker-host itself (grenache) there is no monitoring at all for it's SUT-machines (e.g. redcurrant). Ideas what we could add to our monitoring:
- PowerPC:
- availability (ssh) of powerhmc1.suse.de and powerhmc2.suse.de
- availability of the VIOS for a powerVM host (maybe by using the HMC api and polling the VIOS state?)
Updated by okurz about 4 years ago
- Priority changed from Normal to Low
- Target version set to future
I would like that. But considering how we currently progress within the infrastructure project (very slowly) and that special hardware hypervisor hosts and such are out of scope of SUSE QA Tools, see https://progress.opensuse.org/projects/qa/wiki#Out-of-scope , I consider this low+future
Actions