action #59391
closedPrevent depletion of space on /tmp due to mojo.tmp files from os-autoinst
0%
Description
Motivation¶
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/138860 showed / being overly full on arm-1. Turned out the problem is:
# ls -ltrahS /tmp
total 24G
…
-rw------- 1 _openqa-worker nogroup 969M Sep 28 14:56 mojo.tmp.nq0vLzabb1zWmjGO
-rw------- 1 _openqa-worker nogroup 1.1G Sep 28 15:35 mojo.tmp.F3I7oq122kFrKSfR
-rw------- 1 _openqa-worker nogroup 1.6G Sep 19 19:20 mojo.tmp.JaJQjPsRKeKN_uNA
-rw------- 1 _openqa-worker nogroup 1.8G Sep 28 14:35 mojo.tmp._AGyK7gXhVtqG_nr
-rw------- 1 _openqa-worker nogroup 3.5G Sep 19 18:48 mojo.tmp.bsfcU_u1vvxM0DBk
-rw------- 1 _openqa-worker nogroup 3.5G Nov 8 10:26 mojo.tmp.An73wQN6Zn7AgEX6
-rw------- 1 _openqa-worker nogroup 5.5G Sep 28 14:29 mojo.tmp.7oNEbwY_XExxQnfQ
-rw------- 1 _openqa-worker nogroup 5.6G Sep 28 15:33 mojo.tmp.FOnWm_q01JT9NAEM
Suggestions¶
mojo.tmp files are mostly uploaded files that mojo stores in the upload process. worker sets $cachedir/tmp for those, webui assets/tmp - so the only place where we don't set a MOJO_TMPDIR afaik is os-autoinst. and about /tmp: https://en.opensuse.org/openSUSE:Tmp_on_tmpfs . So regarding /tmp, should we look into MOJO_TMPDIR in os-autoinst, automatic cleanup of /tmp, ignore it?
[13/11/2019 10:48:59] <coolo> coolo@f102#~>cat /etc/tmpfiles.d/tmp.conf
[13/11/2019 10:48:59] <coolo> d /tmp 1777 root root 10d
[13/11/2019 10:49:06] <coolo> for workers we should go with less days even
[13/11/2019 10:49:28] <coolo> but os-autoinst using /tmp is problematic in itself - we got pools on ssds and / on slow, small disks
[13/11/2019 10:50:01] <coolo> so setting MOJO_TMPDIR to pool directory is due anyway - I wonder why tests would upload GBs to os-autoinst though
[13/11/2019 10:50:08] <coolo> I mean these files weren't exactly small
[13/11/2019 10:52:09] <okurz> you know our testers, maybe someone downloading from within tests? like these xen images? but not these, as we are on arm
[13/11/2019 10:53:04] <sebastianriedel> Correct, it should be exclusively temporary uploads that were too large to put into memory and that were not cleaned up properly for "reasons"
[13/11/2019 10:53:49] <sebastianriedel> (or that are still being processed of course)
[13/11/2019 10:58:08] <coolo> with arm workers doing sudden deaths, I wouldn't worry about the cleanup part
Updated by okurz about 5 years ago
- Copied from action #59388: arm-1 / out-of-space warning added
Updated by okurz about 5 years ago
- Status changed from New to Feedback
- Assignee set to okurz
- Target version set to Current Sprint
Updated by okurz about 5 years ago
- Status changed from Feedback to Workable
- Assignee deleted (
okurz) - Target version changed from Current Sprint to Ready
MR merged. Next step: Prevent os-autoinst to write to potentially slow /tmp but use MOJO_TMPDIR in pool like openQA does.
Updated by kraih almost 5 years ago
The cache service has a bug where it creates its own tmp directory but never actually uses it for downloads.
Updated by okurz almost 5 years ago
The original problem was not about empty temp directories though but temp files that are used, are big and left behind.
Updated by kraih almost 5 years ago
Not using its own tmp dir means that the cache service is currently using /tmp
to store tmp files for every download it performs.
Updated by kraih almost 5 years ago
FYI. Mojolicious does not create tmp files for uploads, ever. Those are exclusively for downloads.
Updated by okurz almost 5 years ago
kraih wrote:
Not using its own tmp dir means that the cache service is currently using
/tmp
to store tmp files for every download it performs.
yes. This makes sense now, thank you :)
Updated by kraih almost 5 years ago
- Status changed from In Progress to Feedback
A possible solution has been applied. Lets see if it resolves the issue. https://github.com/os-autoinst/openQA/pull/2555
Updated by livdywan almost 5 years ago
- Target version changed from Ready to Current Sprint
Updated by livdywan over 4 years ago
Looking at https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453 seems to reveal 85.73648718727136
on sda3 (btrfs)
cf. the 85.22238780975964
in mapper/system-root (btrfs)
. Although it doesn't show what occupies that space... I gather the ls
used above is not used anywhere in the CI script.
Updated by okurz over 4 years ago
You are referring to https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453#L28 which is an alert about openqaworker-arm-2, a worker machine, not a webui host. For this ticket I was waiting for our Scrum Master to remind kraih to update his ticket ;) When we confirmed the feature works we should remove the custom tmpfile cleanup in salt.
Updated by livdywan over 4 years ago
okurz wrote:
You are referring to https://gitlab.suse.de/openqa/osd-deployment/-/jobs/176453#L28 which is an alert about openqaworker-arm-2, a worker machine, not a webui host. For this ticket I was waiting for our Scrum Master to remind kraih to update his ticket ;) When we confirmed the feature works we should remove the custom tmpfile cleanup in salt.
I was trying to determine if this ticket is done, and that's what I figured based on the Motivation of the ticket. Surely the OP can correct me what the correct measure is, in addition to pointing out that I was wrong?
Updated by okurz over 4 years ago
- Due date set to 2020-05-15
- Assignee changed from kraih to okurz
Yes, I mixed up webui and worker here.
I will delete the tmpfiles cleanup from workers again with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/303 and will monitor if openQA is actually cleaning up after itself.
Updated by okurz over 4 years ago
- Status changed from Feedback to Resolved
checked with salt -l error --no-color -C 'G@roles:worker' cmd.run "ls -ltra /tmp"
and /tmp looks rather clean on workers
Updated by okurz over 3 years ago
- Related to action #92344: many temp-folders left over from live openQA jobs, regression? added