[alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds threshold size:M
This triggers the alert https://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_openqa-piworker/view I've been adding a silence for.
Strangely I haven't received a notification mail about it.
- AC1: The alert is no longer firing
- AC2: We will still be notified if the disk is really almost full
- Tweak the alert for that worker specifically (so we can utilize more of it without triggering the alert)
- Check with e.g.
ncduwhat could be cleaned up
- Possibly make the journal non-volatile again; however, maybe we should just utilize the disk space we have as the journal is valuable (so basically suggestion one seems preferable)
ncdu doesn't show a clear culprit. Logs take a great share of the space but they are generally rotated and it makes sense to have them in case we need to investigate a problem. The caches also don't seem to overgrow.
I guess for this very size-constrained setup we should just bump our threshold. It doesn't make sense to waste 15 % disk space (by not using it) just to be safe. Let's make it just 10 % and the alert shouldn't trigger anymore.
- Status changed from In Progress to Feedback
The salt states MR unfortunately doesn't work. Follow-up: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/839
- Status changed from Feedback to Resolved
After the follow-up the adapted alert is actually effective. So it is no longer firing and I've removed the silence.
I've also created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/840 to document testing changes like this locally.