Project

General

Profile

Actions

action #102143

closed

o3 ran out of disk space

Added by mkittler almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2021-11-09
Due date:
% Done:

0%

Estimated time:

Description

rm -rf /var/cache/zypp/packages helped as a temporary measure. Logrotation for /var/log/openqa_scheduler and /var/log/openqa_gru seems broken.

acceptance criteria

  • AC1: Logs are rotated so they don't become too big
  • AC2: Temporary files are cleaned up automatically

Files

o3-disk-space.png (110 KB) o3-disk-space.png tinita, 2021-11-11 13:11

Related issues 1 (0 open1 closed)

Copied to openQA Infrastructure - coordination #102266: [epic] o3 ran out of disk spaceResolvedokurz2021-12-21

Actions
Actions #1

Updated by okurz almost 3 years ago

  • Target version set to Ready
Actions #2

Updated by mkittler almost 3 years ago

  • Assignee set to mkittler
Actions #3

Updated by mkittler almost 3 years ago

  • Status changed from New to Feedback
  • I've moved some big files from home directories to /space.
  • I configured logrotate to cover also gru and scheduler logs (same config as on OSD).
  • I've moved some log files to /space/logs because there was still not enough disk space left to perform the log rotation.

We should have enough free disk space again:

/dev/vda1                20G    6,7G   13G   36% /

This should do it for now. Other areas don't look problematic (checked via ncdu -x /). If necessary we can still reduce the log storage duration or store the openQA logs under /space/logs.


It is still not clear why the logrotate config was changed to remove handling of scheduler and gru logs. There actually were rotated scheduler/gru logs from September and the logrotate config was modified on 14. Oktober.

Actions #4

Updated by livdywan almost 3 years ago

Let's conduct 5 WHYs tomorrow 14.00 CE(S)T, at least Oli and I would find it useful to understand why this wasn't working before, and what changed.

Actions #5

Updated by tinita almost 3 years ago

Attached screenshot of munin disk space graph.

Actions #6

Updated by livdywan almost 3 years ago

  • Why did our monitoring not inform us of the issue?
    • thruk.suse.de didn't send any emails
    • This was brought up in eng-testing (Slack) on by Oleksandr on Monday Dec 8 17.08 CEST, o3 returning 503, seemingly the web UI was not running (from a user's point of view)
    • Fabian tried rm -rf /var/cache/zypp/packages and restarted openqa-livehandler.service
1,9G    /var/log/journal
2,7G    /var/log/openqa_scheduler
6,9G    /var/log/openqa_gru

Improvements

Actions #7

Updated by livdywan almost 3 years ago

Actions #8

Updated by mkittler almost 3 years ago

  • Status changed from Feedback to Resolved

Looks like the changed logrotate config works. Since we've created a follow-up ticket for further improvements I'm resolving this one.

Actions

Also available in: Atom PDF