Project

General

Profile

Actions

action #127274

closed

[alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds threshold size:M

Added by mkittler over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-04-05
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

This triggers the alert https://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_openqa-piworker/view I've been adding a silence for.

Strangely I haven't received a notification mail about it.

Acceptance Criteria

  • AC1: The alert is no longer firing
  • AC2: We will still be notified if the disk is really almost full

Suggestions

  • Tweak the alert for that worker specifically (so we can utilize more of it without triggering the alert)
  • Check with e.g. ncdu what could be cleaned up
  • Possibly make the journal non-volatile again; however, maybe we should just utilize the disk space we have as the journal is valuable (so basically suggestion one seems preferable)
Actions #1

Updated by mkittler over 1 year ago

  • Tags set to alert, infra
  • Project changed from openQA Project (public) to openQA Infrastructure (public)
Actions #2

Updated by mkittler over 1 year ago

  • Description updated (diff)
Actions #3

Updated by okurz over 1 year ago

  • Priority changed from Normal to High
  • Target version set to Ready
Actions #4

Updated by livdywan over 1 year ago

  • Subject changed from [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds thesold to [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds thesold size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by mkittler over 1 year ago

  • Assignee set to mkittler
Actions #6

Updated by mkittler over 1 year ago

ncdu doesn't show a clear culprit. Logs take a great share of the space but they are generally rotated and it makes sense to have them in case we need to investigate a problem. The caches also don't seem to overgrow.

I guess for this very size-constrained setup we should just bump our threshold. It doesn't make sense to waste 15 % disk space (by not using it) just to be safe. Let's make it just 10 % and the alert shouldn't trigger anymore.

Actions #7

Updated by mkittler over 1 year ago

  • Status changed from Workable to In Progress
Actions #9

Updated by tinita over 1 year ago

  • Subject changed from [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds thesold size:M to [alert] Usage of partition mmcblk0p3 on openqa-piworker exceeds threshold size:M
Actions #10

Updated by mkittler over 1 year ago

The salt states MR unfortunately doesn't work. Follow-up: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/839

Actions #11

Updated by mkittler over 1 year ago

  • Status changed from Feedback to Resolved

After the follow-up the adapted alert is actually effective. So it is no longer firing and I've removed the silence.

I've also created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/840 to document testing changes like this locally.

Actions

Also available in: Atom PDF