action #160481
closedbackup-vm: partitions usage (%) alert & systemd services alert size:S
0%
Description
Observation¶
Fri, 17 May 2024 04:01:33 +0200
1 firing alert instance
[IMAGE]
📁 GROUPED BY
hostname=backup-vm
🔥 1 firing instances
Firing [stats.openqa-monitor.qa.suse.de]
backup-vm: partitions usage (%) alert
View alert [stats.openqa-monitor.qa.suse.de]
Values
A0=86.0003690373683
Labels
alertname
backup-vm: partitions usage (%) alert
grafana_folder
Also, possibly related:
https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1
Not related, this was about backup-qam
(and not backup-vm
which this ticket is about):
Failed systemd services
2024-05-16 15:27:50 backup-qam check-for-kernel-crash, kdump-notify
Suggestions¶
- Check partition usage and which component contributes the most space usage
- Check what happened that we had this short high usage surge
- Consider increasing the size of the virtually attached storage
Consider tweaking our backup rules to either include less or less retentionNot useful, it was the root partition (but backups are on the separate partition/dev/vdb1
).- Or maybe don't do anything if this only happened once and is not likely to happen again based on monitoring data investigation
Updated by mkittler 6 months ago
URL to relevant panel at the relevant time: https://stats.openqa-monitor.qa.suse.de/d/GDbackup-vm/dashboard-for-backup-vm?viewPanel=65090&orgId=1&from=1715665167550&to=1716249102721
Updated by mkittler 6 months ago
Looks like an update was going on at the time. Probably some snapper cleanup "fixed" the problem later. Considering the file system reached only 86.2 % there was no real problem. Maybe we should just bump the threshold to 90 % because always expecting so much headroom seems a bit wasteful.
I freed up almost 500 MiB by uninstalling libLLVM7 libLLVM9 libLLVM11 libLLVM15 webkit2gtk-4_0-injected-bundles webkit2gtk-4_1-injected-bundles WebKitGTK-4.0-lang gnome-online-accounts gnome-online-accounts-lang
. 500 MiB is not that much of course. According to ncdu
the root filesystem isn't really bloated so I don't think there's much more to gain here, though. According to snapper list
there's also not one big snapshot.
Updated by openqa_review 6 months ago
- Due date set to 2024-06-06
Setting due date based on mean cycle time of SUSE QE Tools