action #127274
closed[alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds threshold size:M
0%
Description
Observation¶
This triggers the alert https://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_openqa-piworker/view I've been adding a silence for.
Strangely I haven't received a notification mail about it.
Acceptance Criteria¶
- AC1: The alert is no longer firing
- AC2: We will still be notified if the disk is really almost full
Suggestions¶
- Tweak the alert for that worker specifically (so we can utilize more of it without triggering the alert)
- Check with e.g.
ncdu
what could be cleaned up - Possibly make the journal non-volatile again; however, maybe we should just utilize the disk space we have as the journal is valuable (so basically suggestion one seems preferable)
Updated by mkittler over 1 year ago
- Tags set to alert, infra
- Project changed from openQA Project (public) to openQA Infrastructure (public)
Updated by okurz over 1 year ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by livdywan over 1 year ago
- Subject changed from [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds thesold to [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds thesold size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler over 1 year ago
ncdu doesn't show a clear culprit. Logs take a great share of the space but they are generally rotated and it makes sense to have them in case we need to investigate a problem. The caches also don't seem to overgrow.
I guess for this very size-constrained setup we should just bump our threshold. It doesn't make sense to waste 15 % disk space (by not using it) just to be safe. Let's make it just 10 % and the alert shouldn't trigger anymore.
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
Updated by tinita over 1 year ago
- Subject changed from [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds thesold size:M to [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds threshold size:M
Updated by mkittler over 1 year ago
The salt states MR unfortunately doesn't work. Follow-up: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/839
Updated by mkittler over 1 year ago
- Status changed from Feedback to Resolved
After the follow-up the adapted alert is actually effective. So it is no longer firing and I've removed the silence.
I've also created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/840 to document testing changes like this locally.