action #154177
Updated by okurz 11 months ago
## Observation From Grafana **[FIRING:1] (File systems alert Salt ai0h5ifVk)**: F0=90.11097415563623 From OSD: ``` # df -h Filesystem Size Used Avail Use% Mounted on … /dev/vdc 10T 9.0T 1.1T 90% /assets ``` ## Suggestions * *DONE* Add a silence http://stats.openqa-monitor.qa.suse.de/alerting/silence/new?alertmanager=grafana&matcher=alertname%3DFile+systems+alert&matcher=grafana_folder%3DSalt&matcher=rule_uid%3Dai0h5ifVk&orgId=1 * *DONE* View dashboard http://stats.openqa-monitor.qa.suse.de/d/WebuiDb?orgId=1 * *DONE* View panel http://stats.openqa-monitor.qa.suse.de/d/WebuiDb?orgId=1&viewPanel=74 * *DONE* Check which assets take the most space * *DONE* (it runs) Crosscheck that our asset cleanup is actually running * *DONE* Our space-aware cleanup should keep a buffer free so if we are now exceeding 90% that likely means that job group quotas are way too high in sum * *DONE* Check settings per job group and adjust quotas as necessary ## Rollback steps * *DONE* Remove silence from https://stats.openqa-monitor.qa.suse.de/alerting/silences