Project

General

Profile

action #59391

Prevent depletion of space on /tmp due to mojo.tmp files from os-autoinst

Added by okurz almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2019-11-13
Due date:
2020-05-15
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/138860 showed / being overly full on arm-1. Turned out the problem is:

# ls -ltrahS /tmp
total 24G
…
-rw------- 1 _openqa-worker nogroup 969M Sep 28 14:56 mojo.tmp.nq0vLzabb1zWmjGO
-rw------- 1 _openqa-worker nogroup 1.1G Sep 28 15:35 mojo.tmp.F3I7oq122kFrKSfR
-rw------- 1 _openqa-worker nogroup 1.6G Sep 19 19:20 mojo.tmp.JaJQjPsRKeKN_uNA
-rw------- 1 _openqa-worker nogroup 1.8G Sep 28 14:35 mojo.tmp._AGyK7gXhVtqG_nr
-rw------- 1 _openqa-worker nogroup 3.5G Sep 19 18:48 mojo.tmp.bsfcU_u1vvxM0DBk
-rw------- 1 _openqa-worker nogroup 3.5G Nov  8 10:26 mojo.tmp.An73wQN6Zn7AgEX6
-rw------- 1 _openqa-worker nogroup 5.5G Sep 28 14:29 mojo.tmp.7oNEbwY_XExxQnfQ
-rw------- 1 _openqa-worker nogroup 5.6G Sep 28 15:33 mojo.tmp.FOnWm_q01JT9NAEM

Suggestions

mojo.tmp files are mostly uploaded files that mojo stores in the upload process. worker sets $cachedir/tmp for those, webui assets/tmp - so the only place where we don't set a MOJO_TMPDIR afaik is os-autoinst. and about /tmp: https://en.opensuse.org/openSUSE:Tmp_on_tmpfs . So regarding /tmp, should we look into MOJO_TMPDIR in os-autoinst, automatic cleanup of /tmp, ignore it?

[13/11/2019 10:48:59] <coolo> coolo@f102#~>cat /etc/tmpfiles.d/tmp.conf 
[13/11/2019 10:48:59] <coolo> d /tmp 1777 root root 10d
[13/11/2019 10:49:06] <coolo> for workers we should go with less days even
[13/11/2019 10:49:28] <coolo> but os-autoinst using /tmp is problematic in itself - we got pools on ssds and / on slow, small disks
[13/11/2019 10:50:01] <coolo> so setting MOJO_TMPDIR to pool directory is due anyway - I wonder why tests would upload GBs to os-autoinst though
[13/11/2019 10:50:08] <coolo> I mean these files weren't exactly small
[13/11/2019 10:52:09] <okurz> you know our testers, maybe someone downloading from within tests? like these xen images? but not these, as we are on arm
[13/11/2019 10:53:04] <sebastianriedel> Correct, it should be exclusively temporary uploads that were too large to put into memory and that were not cleaned up properly for "reasons"
[13/11/2019 10:53:49] <sebastianriedel> (or that are still being processed of course)
[13/11/2019 10:58:08] <coolo> with arm workers doing sudden deaths, I wouldn't worry about the cleanup part

Related issues

Related to openQA Project - action #92344: many temp-folders left over from live openQA jobs, regression?Resolved2021-05-082021-06-02

Copied from openQA Infrastructure - action #59388: arm-1 / out-of-space warningResolved2019-11-13

History

#1 Updated by okurz almost 2 years ago

#2 Updated by okurz almost 2 years ago

  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Current Sprint

#3 Updated by okurz almost 2 years ago

  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Target version changed from Current Sprint to Ready

MR merged. Next step: Prevent os-autoinst to write to potentially slow /tmp but use MOJO_TMPDIR in pool like openQA does.

#4 Updated by okurz almost 2 years ago

  • Assignee set to kraih

… as you asked in chat :)

#5 Updated by kraih almost 2 years ago

  • Status changed from Workable to In Progress

#6 Updated by kraih almost 2 years ago

The cache service has a bug where it creates its own tmp directory but never actually uses it for downloads.

#7 Updated by kraih almost 2 years ago

  • Priority changed from Low to Normal

#8 Updated by okurz almost 2 years ago

The original problem was not about empty temp directories though but temp files that are used, are big and left behind.

#9 Updated by kraih almost 2 years ago

Not using its own tmp dir means that the cache service is currently using /tmp to store tmp files for every download it performs.

#10 Updated by kraih almost 2 years ago

FYI. Mojolicious does not create tmp files for uploads, ever. Those are exclusively for downloads.

#11 Updated by okurz almost 2 years ago

kraih wrote:

Not using its own tmp dir means that the cache service is currently using /tmp to store tmp files for every download it performs.

yes. This makes sense now, thank you :)

#12 Updated by kraih almost 2 years ago

  • Status changed from In Progress to Feedback

A possible solution has been applied. Lets see if it resolves the issue. https://github.com/os-autoinst/openQA/pull/2555

#13 Updated by cdywan over 1 year ago

  • Target version changed from Ready to Current Sprint

#14 Updated by cdywan over 1 year ago

Looking at https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453 seems to reveal 85.73648718727136 on sda3 (btrfs) cf. the 85.22238780975964 in mapper/system-root (btrfs). Although it doesn't show what occupies that space... I gather the ls used above is not used anywhere in the CI script.

#15 Updated by okurz over 1 year ago

You are referring to https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453#L28 which is an alert about openqaworker-arm-2, a worker machine, not a webui host. For this ticket I was waiting for our Scrum Master to remind kraih to update his ticket ;) When we confirmed the feature works we should remove the custom tmpfile cleanup in salt.

#16 Updated by cdywan over 1 year ago

okurz wrote:

You are referring to https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453#L28 which is an alert about openqaworker-arm-2, a worker machine, not a webui host. For this ticket I was waiting for our Scrum Master to remind kraih to update his ticket ;) When we confirmed the feature works we should remove the custom tmpfile cleanup in salt.

I was trying to determine if this ticket is done, and that's what I figured based on the Motivation of the ticket. Surely the OP can correct me what the correct measure is, in addition to pointing out that I was wrong?

#17 Updated by okurz over 1 year ago

  • Due date set to 2020-05-15
  • Assignee changed from kraih to okurz

Yes, I mixed up webui and worker here.

I will delete the tmpfiles cleanup from workers again with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/303 and will monitor if openQA is actually cleaning up after itself.

#18 Updated by okurz over 1 year ago

  • Status changed from Feedback to Resolved

checked with salt -l error --no-color -C 'G@roles:worker' cmd.run "ls -ltra /tmp" and /tmp looks rather clean on workers

#19 Updated by okurz 4 months ago

  • Related to action #92344: many temp-folders left over from live openQA jobs, regression? added

Also available in: Atom PDF