Project

General

Profile

action #154177

Updated by mkittler 4 months ago

## Observation 

 From Grafana **[FIRING:1] (File systems alert Salt ai0h5ifVk)**: 

      F0=90.11097415563623 


 From OSD: 
    
 ``` 
 # df -h 
 Filesystem        Size    Used Avail Use% Mounted on 
 … 
 /dev/vdc           10T    9.0T    1.1T    90% /assets 
 ``` 

 ## Suggestions 
 * *DONE* Add a silence http://stats.openqa-monitor.qa.suse.de/alerting/silence/new?alertmanager=grafana&matcher=alertname%3DFile+systems+alert&matcher=grafana_folder%3DSalt&matcher=rule_uid%3Dai0h5ifVk&orgId=1 
 * View dashboard http://stats.openqa-monitor.qa.suse.de/d/WebuiDb?orgId=1 
 * View panel http://stats.openqa-monitor.qa.suse.de/d/WebuiDb?orgId=1&viewPanel=74 
 * Check which assets take the most space 
 * *DONE* (it runs) Crosscheck that our asset cleanup is actually running 
 * Our space-aware cleanup should keep a buffer free so if we are now exceeding 90% that likely means that job group quotas are way too high in sum 
 * Check settings per job group and adjust quotas as necessary 

 ## Rollback steps 
 * Remove silence from https://stats.openqa-monitor.qa.suse.de/alerting/silences

Back