Project

General

Profile

Actions

action #131150

closed

coordination #132275: [epic] Better o3 monitoring

Add alarms for partition usage on o3 size:M

Added by okurz 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-06-20
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

From https://mailman.suse.de/mlarch/SuSE/o3-admins/2023/o3-admins.2023.06/msg00042.html : We have received an alert message by munin about /assets on o3 being 92% full. I think 92% (right now even increased to 93% on o3) is alarming and we should have been noticed about from zabbix where likely the old alarm thresholds were not migrated. We should ensure that there is sufficient alerting. We could go with munin but I guess for something as low-level as disk usage zabbix should be easy enough to use.

Acceptance criteria

  • AC1: A SUSE-IT maintained monitoring solution will alert us if /assets exceeds 90% usage

Suggestions


Related issues 3 (0 open3 closed)

Related to openQA Infrastructure - action #132218: Conduct lessons learned for "openQA is not accessible" on 2023-07-02Resolvedokurz2023-07-02

Actions
Related to openQA Infrastructure - action #132815: [alert][flaky][o3] Multiple flaky zabbix alerts related to o3Resolvedjbaier_cz2023-07-16

Actions
Copied from openQA Infrastructure - action #131147: Reduce /assets usage on o3Resolvedokurz2023-06-20

Actions
Actions

Also available in: Atom PDF