Project

General

Profile

action #92344

openQA Infrastructure - action #92338: [Alerting] File systems alert, / on osd

many temp-folders left over from live openQA jobs, regression?

Added by okurz 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2021-05-08
Due date:
2021-06-02
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

In my investigation for #92338 I found that a directory like /tmp/FOWvYnWzKt from 2021-03-24 05:45, the first non-empty directory, has a content:

1616561148_719579.png  autoinst-log-live.txt  last.png  serial-terminal-live.txt

this looks more like a regression in openQA or some dependency. And there are many like these:

openqa:/tmp # find -name 'autoinst-log-live.txt' | wc -l
3216

Acceptance criteria

  • AC1: No temporary folders with data from live openQA jobs left over after jobs ended and were completely processed

Acceptance tests

  • AT1-1: on osd test $(find -name 'autoinst-log-live.txt' -mtime 1 | wc -l) == 0

Suggestions

Research if there was a corresponding change in openQA or package updates on 2021-03-24 or in before.


Related issues

Related to openQA Project - action #59391: Prevent depletion of space on /tmp due to mojo.tmp files from os-autoinstResolved2019-11-132020-05-15

History

#1 Updated by mkittler 2 months ago

  • Assignee set to mkittler

I've just had a quick look and couldn't find any place where we would actually ever delete the temporary directory. However, there's a comment # TODO: cleanup previous tmpdir which should likely be implemented.

#2 Updated by coolo 2 months ago

Fixing the config for systemd-tmpfiles is actually much easier than finding all places where tmp files are generated. Taking that processes can be interrupted, reaching 0 is unlikely this way anyway

#3 Updated by coolo 2 months ago

May 14 11:20:29 openqa systemd-tmpfiles[824]: [/etc/tmpfiles.d/tmp.conf:10] Duplicate line for path "/tmp", ignoring.
May 14 11:20:29 openqa systemd-tmpfiles[824]: [/etc/tmpfiles.d/tmp.conf:11] Duplicate line for path "/var/tmp", ignoring.

There is your problem. Split tmp.conf into fs-tmp.conf and fs-var-tmp.conf (and remove the references to /tmp/systemd) and you should be set

#4 Updated by mkittler 2 months ago

Ah, we have already a config for that. Using systemd-tmpfiles makes sense, indeed. Then I'll fix it. Maybe it makes still sense to get rid of the TODO comment because it suggest that this would be handled directly within our code.

#5 Updated by coolo 2 months ago

well, you can still try to find a 80% solution within the code - but I don't think that's High priority. The High comes from the broken tempfiles config.

And if you wonder "but where did this regression come from?": https://build.suse.de/request/show/235608 was released 2 months ago. Creating fs-tmp.conf and as such disabling your tmp.conf as duplicate

#6 Updated by mkittler 2 months ago

I've been splitting the files and removed the tmp.conf file completely because the remaining systemd related paths/config doesn't differ from the one under /usr/lib/tmpfiles.d/ (provided by the filesystem package). After restarting systemd-tmpfiles-clean.service the log messages are indeed gone. Should I add this kind of config to salt? The old tmp.conf wasn't in salt either. (SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/488)

I've also checked on o3 where we've got the same problem plus an override for the web UI's temp dir. I've did the splitting there as well.

#7 Updated by mkittler 2 months ago

  • Status changed from Workable to Feedback

#8 Updated by okurz 2 months ago

  • Related to action #59391: Prevent depletion of space on /tmp due to mojo.tmp files from os-autoinst added

#9 Updated by okurz 2 months ago

I merged https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/488 . Please keep in mind #59391 where we already added tmpfile config but subsequently removed it again after a "tmpfile leak" in openQA was fixed. Not sure why we (or maybe just me) decided to remove the tmpfile handling again.

#10 Updated by mkittler 2 months ago

Looks like the change worked.

The other ticket is about the worker host. I can still try to avoid the tmpfiles in the code. It would be easy to cleanup at least of the tmpfile of a worker's previous job before starting a new one.

#11 Updated by okurz 2 months ago

mkittler wrote:

Looks like the change worked.

The other ticket is about the worker host. I can still try to avoid the tmpfiles in the code. It would be easy to cleanup at least of the tmpfile of a worker's previous job before starting a new one.

Yes, the other ticket is about the worker, so only "remotely related". If you see something that could be done about tempfiles on the central webUI instance then you can do that as part of this ticket. For anything on the worker I suggest another ticket which we do not need to care about soon.

#12 Updated by okurz 2 months ago

  • Status changed from Feedback to In Progress

So as discussed we can try to cover the comment "# TODO: cleanup previous tmpdir", mkittler please do that

#14 Updated by openqa_review 2 months ago

  • Due date set to 2021-06-02

Setting due date based on mean cycle time of SUSE QE Tools

#15 Updated by mkittler 2 months ago

  • Status changed from In Progress to Resolved

The PR has been merged.

Also available in: Atom PDF