Project

General

Profile

Actions

action #70885

closed

[osd][alert] flaky file system alert: /assets

Added by okurz over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
-
Start date:
2020-09-02
Due date:
2021-07-23
% Done:

0%

Estimated time:
Tags:

Description

Observation

received alert email 2020-09-02 14:27Z

/*[Alerting] File systems alert*/

One of the file systems is too full

*Metric name*
*Value*
/assets: Used Percentage
94.207

30m later the status switched back to "OK" but I guess we can easily hit the limit again.

panel can be found on
https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=74&orgId=1

Problem

The alert is flaky as it went back to "ok" without explicit user action.

Suggestions

  • Make sure some assets are cleaned up as we can not keep that many and 4.7TB for assets is too much.
  • Research if a better hysteresis can be implemented in grafana, e.g. the alert would trigger if 94% is reached but only recover if usage goes below 92%

Further notes

I did not pause the alert as it is currently "ok" and we need to be careful that the available disk space is not completely depleted.

94% usage on a filesystem is already much. We must not increase the alert threshold further.


Related issues 1 (0 open1 closed)

Copied to openQA Infrastructure (public) - action #71575: [osd][alert] limited /assets - idea: ask EngInfra for slow+cheap storage from central server for /assets/fixed onlyResolvedmkittler2020-09-02

Actions
Actions

Also available in: Atom PDF