Project

General

Profile

action #70885

Updated by okurz almost 3 years ago

## Observation 

 received alert email 2020-09-02 14:27Z 

 ``` 
 /*[Alerting] File systems alert*/ 

 One of the file systems is too full 

 *Metric name* 
 *Value* 
 /assets: Used Percentage 
 94.207 
 ``` 

 30m later the status switched back to "OK" but I guess we can easily hit the limit again. 

 panel can be found on 
 https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=74&orgId=1 

 ## Problem 

 The alert is flaky as it went back to "ok" without explicit user action. 

 ## Suggestions 

 * Make sure some assets are cleaned up as we can not keep that many and 4.7TB for assets is too much. 
 * Research if a better hysteresis can be implemented in grafana, e.g. the alert would trigger if 94% is reached but only recover if usage goes below 92% 

 ## Further notes 

 I did not pause the alert as it is currently "ok" and we need to be careful that the available disk space is not completely depleted. 

 94% usage on a filesystem is already much. We must not increase the alert threshold further.

Back