action #59391

Prevent depletion of space on /tmp due to mojo.tmp files from os-autoinst

Added by okurz 5 months ago. Updated 26 days ago.

Status:FeedbackStart date:13/11/2019
Priority:NormalDue date:
Assignee:kraih% Done:

0%

Category:Feature requests
Target version:Current Sprint
Difficulty:
Duration:

Description

Motivation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/138860 showed / being overly full on arm-1. Turned out the problem is:

# ls -ltrahS /tmp
total 24G
…
-rw------- 1 _openqa-worker nogroup 969M Sep 28 14:56 mojo.tmp.nq0vLzabb1zWmjGO
-rw------- 1 _openqa-worker nogroup 1.1G Sep 28 15:35 mojo.tmp.F3I7oq122kFrKSfR
-rw------- 1 _openqa-worker nogroup 1.6G Sep 19 19:20 mojo.tmp.JaJQjPsRKeKN_uNA
-rw------- 1 _openqa-worker nogroup 1.8G Sep 28 14:35 mojo.tmp._AGyK7gXhVtqG_nr
-rw------- 1 _openqa-worker nogroup 3.5G Sep 19 18:48 mojo.tmp.bsfcU_u1vvxM0DBk
-rw------- 1 _openqa-worker nogroup 3.5G Nov  8 10:26 mojo.tmp.An73wQN6Zn7AgEX6
-rw------- 1 _openqa-worker nogroup 5.5G Sep 28 14:29 mojo.tmp.7oNEbwY_XExxQnfQ
-rw------- 1 _openqa-worker nogroup 5.6G Sep 28 15:33 mojo.tmp.FOnWm_q01JT9NAEM

Suggestions

mojo.tmp files are mostly uploaded files that mojo stores in the upload process. worker sets $cachedir/tmp for those, webui assets/tmp - so the only place where we don't set a MOJO_TMPDIR afaik is os-autoinst. and about /tmp: https://en.opensuse.org/openSUSE:Tmp_on_tmpfs . So regarding /tmp, should we look into MOJO_TMPDIR in os-autoinst, automatic cleanup of /tmp, ignore it?

[13/11/2019 10:48:59] <coolo> coolo@f102#~>cat /etc/tmpfiles.d/tmp.conf 
[13/11/2019 10:48:59] <coolo> d /tmp 1777 root root 10d
[13/11/2019 10:49:06] <coolo> for workers we should go with less days even
[13/11/2019 10:49:28] <coolo> but os-autoinst using /tmp is problematic in itself - we got pools on ssds and / on slow, small disks
[13/11/2019 10:50:01] <coolo> so setting MOJO_TMPDIR to pool directory is due anyway - I wonder why tests would upload GBs to os-autoinst though
[13/11/2019 10:50:08] <coolo> I mean these files weren't exactly small
[13/11/2019 10:52:09] <okurz> you know our testers, maybe someone downloading from within tests? like these xen images? but not these, as we are on arm
[13/11/2019 10:53:04] <sebastianriedel> Correct, it should be exclusively temporary uploads that were too large to put into memory and that were not cleaned up properly for "reasons"
[13/11/2019 10:53:49] <sebastianriedel> (or that are still being processed of course)
[13/11/2019 10:58:08] <coolo> with arm workers doing sudden deaths, I wouldn't worry about the cleanup part

Related issues

Copied from openQA Infrastructure - action #59388: arm-1 / out-of-space warning Resolved 13/11/2019

History

#1 Updated by okurz 5 months ago

#2 Updated by okurz 5 months ago

  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Current Sprint

#3 Updated by okurz 5 months ago

  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Target version changed from Current Sprint to Ready

MR merged. Next step: Prevent os-autoinst to write to potentially slow /tmp but use MOJO_TMPDIR in pool like openQA does.

#4 Updated by okurz 4 months ago

  • Assignee set to kraih

… as you asked in chat :)

#5 Updated by kraih 4 months ago

  • Status changed from Workable to In Progress

#6 Updated by kraih 4 months ago

The cache service has a bug where it creates its own tmp directory but never actually uses it for downloads.

#7 Updated by kraih 4 months ago

  • Priority changed from Low to Normal

#8 Updated by okurz 4 months ago

The original problem was not about empty temp directories though but temp files that are used, are big and left behind.

#9 Updated by kraih 4 months ago

Not using its own tmp dir means that the cache service is currently using /tmp to store tmp files for every download it performs.

#10 Updated by kraih 4 months ago

FYI. Mojolicious does not create tmp files for uploads, ever. Those are exclusively for downloads.

#11 Updated by okurz 4 months ago

kraih wrote:

Not using its own tmp dir means that the cache service is currently using /tmp to store tmp files for every download it performs.

yes. This makes sense now, thank you :)

#12 Updated by kraih 4 months ago

  • Status changed from In Progress to Feedback

A possible solution has been applied. Lets see if it resolves the issue. https://github.com/os-autoinst/openQA/pull/2555

#13 Updated by cdywan 2 months ago

  • Target version changed from Ready to Current Sprint

#14 Updated by cdywan 26 days ago

Looking at https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453 seems to reveal 85.73648718727136 on sda3 (btrfs) cf. the 85.22238780975964 in mapper/system-root (btrfs). Although it doesn't show what occupies that space... I gather the ls used above is not used anywhere in the CI script.

#15 Updated by okurz 26 days ago

You are referring to https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453#L28 which is an alert about openqaworker-arm-2, a worker machine, not a webui host. For this ticket I was waiting for our Scrum Master to remind kraih to update his ticket ;) When we confirmed the feature works we should remove the custom tmpfile cleanup in salt.

Also available in: Atom PDF