Project

General

Profile

action #150887

Updated by okurz 6 months ago

## Observation 
 From email 
  Firing 
 s390zl12: partitions usage (%) alert 
 View alert 
 Values 
 A0=88.11778063708574  
 Labels 
 alertname        	 s390zl12: partitions usage (%) alert 
 grafana_folder        	 Generic 
 hostname        	 s390zl12 
 rule_uid        	 partitions_usage_alert_s390zl12 
 type        	 generic 
 Silence 
 View dashboard 
 View panel 
 Observed 32s before this notification was delivered, at 2023-11-15 03:48:00 +0100 CET 

 panel link http://stats.openqa-monitor.qa.suse.de/d/GDs390zl12?orgId=1&viewPanel=65090 

 ## Suggestions 
 * So we see that at least one partition was 88% full which is apparently above our threshold 
 * Check the actual threshold 
 * Ensure that our NFS share from OSD is not the one we alert about 
 * There is a cleanup script triggered by cron or systemd timer (TBC) which might trigger less often than what we check the partition usage for so maybe that is racy 

 ## Rollback actions 
 * Remove silence for `rule_uid=~partitions_usage_alert_s390zl.*` from https://stats.openqa-monitor.qa.suse.de/alerting/silences

Back