Project

General

Profile

action #160481

Updated by mkittler about 2 months ago

## Observation 

 ``` 
 Fri, 17 May 2024 04:01:33 +0200 

 1 firing alert instance 
 [IMAGE] 

 📁 GROUPED BY  

 hostname=backup-vm 

   🔥 1 firing instances 

 Firing [stats.openqa-monitor.qa.suse.de] 
 backup-vm: partitions usage (%) alert 
 View alert [stats.openqa-monitor.qa.suse.de] 
 Values 
 A0=86.0003690373683  
 Labels 
 alertname 
 backup-vm: partitions usage (%) alert 
 grafana_folder 
 ``` 

 http://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_backup-vm/view?orgId=1 

 ~~Also, Also, possibly related: 
 https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1~~ 
 Not related, this was about `backup-qam` (and *not* `backup-vm` which this ticket is about): https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1 

 ``` 
 Failed systemd services 
 2024-05-16 15:27:50      backup-qam      check-for-kernel-crash, kdump-notify 
 ``` 

 ## Suggestions 
 * Check partition usage and which component contributes the most space usage 
 * Check what happened that we had this short high usage surge 
 * Consider increasing the size of the virtually attached storage 
 * ~~Consider Consider tweaking our backup rules to either include less or less retention~~ Not useful, it was the root partition (but backups are on the separate partition `/dev/vdb1`). retention 
 * Or maybe don't do anything if this only happened once and is not likely to happen again based on monitoring data investigation

Back