action #71590: [osd][alert] Implement proper monitoring of needed resources of workers - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #71590

open

[osd][alert] Implement proper monitoring of needed resources of workers

Added by nicksinger over 4 years ago. Updated over 4 years ago.

Status:

New

Priority:

Low

Assignee:

Category:

Target version:

QA (public) - future

Start date:

2020-09-21

Due date:

% Done:

Estimated time:

Description

We just had a case that all powerVM jobs where failing to boot (result: failed) because the VIOS of one of our powerVM hosts was down.
So while we monitor the worker-host itself (grenache) there is no monitoring at all for it's SUT-machines (e.g. redcurrant). Ideas what we could add to our monitoring:

PowerPC:
1. availability (ssh) of powerhmc1.suse.de and powerhmc2.suse.de
2. availability of the VIOS for a powerVM host (maybe by using the HMC api and polling the VIOS state?)

Actions

Copy link

Updated by okurz over 4 years ago

Priority changed from Normal to Low
Target version set to future

I would like that. But considering how we currently progress within the infrastructure project (very slowly) and that special hardware hypervisor hosts and such are out of scope of SUSE QA Tools, see https://progress.opensuse.org/projects/qa/wiki#Out-of-scope , I consider this low+future

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #71590

[osd][alert] Implement proper monitoring of needed resources of workers

Updated by okurz over 4 years ago