action #92344
closedopenQA Infrastructure (public) - action #92338: [Alerting] File systems alert, / on osd
many temp-folders left over from live openQA jobs, regression?
Description
Observation¶
In my investigation for #92338 I found that a directory like /tmp/FOWvYnWzKt from 2021-03-24 05:45, the first non-empty directory, has a content:
1616561148_719579.png autoinst-log-live.txt last.png serial-terminal-live.txt
this looks more like a regression in openQA or some dependency. And there are many like these:
openqa:/tmp # find -name 'autoinst-log-live.txt' | wc -l
3216
Acceptance criteria¶
- AC1: No temporary folders with data from live openQA jobs left over after jobs ended and were completely processed
Acceptance tests¶
- AT1-1: on osd
test $(find -name 'autoinst-log-live.txt' -mtime 1 | wc -l) == 0
Suggestions¶
Research if there was a corresponding change in openQA or package updates on 2021-03-24 or in before.
Updated by mkittler over 3 years ago
- Assignee set to mkittler
I've just had a quick look and couldn't find any place where we would actually ever delete the temporary directory. However, there's a comment # TODO: cleanup previous tmpdir
which should likely be implemented.
Updated by coolo over 3 years ago
Fixing the config for systemd-tmpfiles is actually much easier than finding all places where tmp files are generated. Taking that processes can be interrupted, reaching 0 is unlikely this way anyway
Updated by coolo over 3 years ago
May 14 11:20:29 openqa systemd-tmpfiles[824]: [/etc/tmpfiles.d/tmp.conf:10] Duplicate line for path "/tmp", ignoring.
May 14 11:20:29 openqa systemd-tmpfiles[824]: [/etc/tmpfiles.d/tmp.conf:11] Duplicate line for path "/var/tmp", ignoring.
There is your problem. Split tmp.conf into fs-tmp.conf and fs-var-tmp.conf (and remove the references to /tmp/systemd) and you should be set
Updated by mkittler over 3 years ago
Ah, we have already a config for that. Using systemd-tmpfiles makes sense, indeed. Then I'll fix it. Maybe it makes still sense to get rid of the TODO
comment because it suggest that this would be handled directly within our code.
Updated by coolo over 3 years ago
well, you can still try to find a 80% solution within the code - but I don't think that's High priority. The High comes from the broken tempfiles config.
And if you wonder "but where did this regression come from?": https://build.suse.de/request/show/235608 was released 2 months ago. Creating fs-tmp.conf and as such disabling your tmp.conf as duplicate
Updated by mkittler over 3 years ago
I've been splitting the files and removed the tmp.conf
file completely because the remaining systemd related paths/config doesn't differ from the one under /usr/lib/tmpfiles.d/
(provided by the filesystem
package). After restarting systemd-tmpfiles-clean.service
the log messages are indeed gone. Should I add this kind of config to salt? The old tmp.conf
wasn't in salt either. (SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/488)
I've also checked on o3 where we've got the same problem plus an override for the web UI's temp dir. I've did the splitting there as well.
Updated by okurz over 3 years ago
- Related to action #59391: Prevent depletion of space on /tmp due to mojo.tmp files from os-autoinst added
Updated by okurz over 3 years ago
I merged https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/488 . Please keep in mind #59391 where we already added tmpfile config but subsequently removed it again after a "tmpfile leak" in openQA was fixed. Not sure why we (or maybe just me) decided to remove the tmpfile handling again.
Updated by mkittler over 3 years ago
Looks like the change worked.
The other ticket is about the worker host. I can still try to avoid the tmpfiles in the code. It would be easy to cleanup at least of the tmpfile of a worker's previous job before starting a new one.
Updated by okurz over 3 years ago
mkittler wrote:
Looks like the change worked.
The other ticket is about the worker host. I can still try to avoid the tmpfiles in the code. It would be easy to cleanup at least of the tmpfile of a worker's previous job before starting a new one.
Yes, the other ticket is about the worker, so only "remotely related". If you see something that could be done about tempfiles on the central webUI instance then you can do that as part of this ticket. For anything on the worker I suggest another ticket which we do not need to care about soon.
Updated by okurz over 3 years ago
- Status changed from Feedback to In Progress
So as discussed we can try to cover the comment "# TODO: cleanup previous tmpdir", @mkittler please do that
Updated by mkittler over 3 years ago
PR for that: https://github.com/os-autoinst/openQA/pull/3905
Updated by openqa_review over 3 years ago
- Due date set to 2021-06-02
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 3 years ago
- Status changed from In Progress to Resolved
The PR has been merged.