Project

General

Profile

Actions

action #71590

open

[osd][alert] Implement proper monitoring of needed resources of workers

Added by nicksinger over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Target version:
Start date:
2020-09-21
Due date:
% Done:

0%

Estimated time:

Description

We just had a case that all powerVM jobs where failing to boot (result: failed) because the VIOS of one of our powerVM hosts was down.
So while we monitor the worker-host itself (grenache) there is no monitoring at all for it's SUT-machines (e.g. redcurrant). Ideas what we could add to our monitoring:

  • PowerPC:
    1. availability (ssh) of powerhmc1.suse.de and powerhmc2.suse.de
    2. availability of the VIOS for a powerVM host (maybe by using the HMC api and polling the VIOS state?)
Actions #1

Updated by okurz over 3 years ago

  • Priority changed from Normal to Low
  • Target version set to future

I would like that. But considering how we currently progress within the infrastructure project (very slowly) and that special hardware hypervisor hosts and such are out of scope of SUSE QA Tools, see https://progress.opensuse.org/projects/qa/wiki#Out-of-scope , I consider this low+future

Actions

Also available in: Atom PDF