Project

General

Profile

Actions

action #169750

closed

[alert] backup-vm (backup-vm: partitions usage (%) alert Generic partitions_usage_alert_backup-vm generic)

Added by ybonatakis about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
2024-11-12
Due date:
2024-11-27
% Done:

0%

Estimated time:

Description

I checked the machine and found
du -ah /path/to/directory | sort -rh | head -n 10

/dev/mapper/system-root   24G   14G  9.6G  59% /opt
/dev/vdb1                2.0T  1.7T  295G  86% /home

○ logrotate.service - Rotate log files
     Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static)
     Active: inactive (dead) since Tue 2024-11-12 00:00:02 CET; 15h ago
TriggeredBy: ● logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
   Main PID: 29592 (code=exited, status=0/SUCCESS)
        CPU: 163ms

Warning: some journal files were not opened due to insufficient permissions.

action taken: I restarted logrotateand run sudo du -ah /home | sort -rh | head -n 10


Related issues 1 (1 open0 closed)

Is duplicate of openQA Infrastructure (public) - action #167722: Efficient use of monitoring data within influxdb on monitor.qe.nue2.suse.org size:MWorkablenicksinger2024-10-02

Actions
Actions #1

Updated by ybonatakis about 1 month ago

I also notice that the Disk I/O is significant bigger in the last 90 days at the relevant panels https://stats.openqa-monitor.qa.suse.de/d/GDbackup-vm/dashboard-for-backup-vm?orgId=1&from=now-90d&to=now&timezone=browser&var-datasource=000000001&refresh=1m

and I found Active: inactive (dead) since Tue 2024-11-12 16:33:35 CET; 37min ago after restart of logrotate

Actions #2

Updated by ybonatakis about 1 month ago

  • Is duplicate of action #167722: Efficient use of monitoring data within influxdb on monitor.qe.nue2.suse.org size:M added
Actions #3

Updated by ybonatakis about 1 month ago

  • Status changed from New to Closed
Actions #4

Updated by okurz about 1 month ago

  • Tags changed from infra to infra, reactive work
  • Due date set to 2024-11-27
  • Category set to Regressions/Crashes
  • Status changed from Closed to Feedback
  • Assignee set to ybonatakis
  • Priority changed from Normal to High
  • Target version set to Ready

@ybonatakis I don't understand. Please clarify why you see this ticket as duplicate of #167722. Checking on backup-vm I still see 86% usage of /dev/vdb1 and https://stats.openqa-monitor.qa.suse.de/d/GDbackup-vm/dashboard-for-backup-vm?orgId=1&from=now-90d&to=now&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-65090 still shows the alerting state

Actions #5

Updated by okurz about 1 month ago

  • Assignee changed from ybonatakis to nicksinger

as discussed in the weekly coordination call

Actions #6

Updated by nicksinger about 1 month ago

  • Status changed from Feedback to Resolved

okurz wrote in #note-4:

@ybonatakis I don't understand. Please clarify why you see this ticket as duplicate of #167722. Checking on backup-vm I still see 86% usage of /dev/vdb1 and https://stats.openqa-monitor.qa.suse.de/d/GDbackup-vm/dashboard-for-backup-vm?orgId=1&from=now-90d&to=now&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-65090 still shows the alerting state

that was based on a comment from me that my backup conducted in #167722 caused the disk to fill up on our backup-vm. I currently silenced the alert ( https://stats.openqa-monitor.qa.suse.de/alerting/silences ) and will take care of removing my backup again as part of #167722.

Actions

Also available in: Atom PDF