action #167728: grafana dashboard for monitor.qe.nue2.suse.org size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #167728

closed

coordination #161414: [epic] Improved salt based infrastructure management

grafana dashboard for monitor.qe.nue2.suse.org size:S

Added by okurz 7 months ago. Updated 7 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

gpathak

Category:

Feature requests

Target version:

openQA Project (public) - Ready

Start date:

2024-10-02

Due date:

% Done:

Estimated time:

Tags:

space, monitor, influxdb, grafana, infra, vm, qamaster

Description

Motivation¶

In #167719 we ran out of space because influxdb grew to 300G+ on monitor.qe.nue2.suse.org and okurz only realized because grafana did not show any more up-to-date data. There was no related alert and also no alert about the decreased availability of space on the host before the incident. We have telegraf running on monitor but we have no generic machine dashboard which we should have like for other "generic" machines, i.e. not-worker and not-webui.

Acceptance criteria¶

AC1: A machine specific grafana dashboard with alert definitions exists for all machines (including monitor.qe.nue2.suse.org)
AC2: Special machine roles like openQA worker and openQA webUI don't have multiple dashboards showing the same data (e.g. not special openQA one + generic one)

Suggestions¶

In https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/ac3057d7835f7ab842c9c3a508417f87e7ab6d55/monitoring/grafana.sls#L8 we set "genericnames" to all machines which are not worker, not webui and not monitor. That means we don't create a generic machine monitoring for monitor. Review where "genericnames" is used and evaluate if that can be tweaked to still create a generic machine monitoring dashboard for monitor while not applying other potentially harmful changes to the host

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #167728

grafana dashboard for monitor.qe.nue2.suse.org size:S

Motivation¶

Acceptance criteria¶

Suggestions¶

Updated by okurz 7 months ago

Updated by nicksinger 7 months ago

Updated by okurz 7 months ago

Updated by okurz 7 months ago

Updated by gpathak 7 months ago

Updated by gpathak 7 months ago

Updated by okurz 7 months ago

Updated by okurz 7 months ago

Updated by gpathak 7 months ago

Updated by gpathak 7 months ago

Updated by gpathak 7 months ago

Updated by gpathak 7 months ago