openQA Infrastructure - action #92338: [Alerting] File systems alert, / on osd
many temp-folders left over from live openQA jobs, regression?
In my investigation for #92338 I found that a directory like /tmp/FOWvYnWzKt from 2021-03-24 05:45, the first non-empty directory, has a content:
1616561148_719579.png autoinst-log-live.txt last.png serial-terminal-live.txt
this looks more like a regression in openQA or some dependency. And there are many like these:
openqa:/tmp # find -name 'autoinst-log-live.txt' | wc -l 3216
- AC1: No temporary folders with data from live openQA jobs left over after jobs ended and were completely processed
- AT1-1: on osd
test $(find -name 'autoinst-log-live.txt' -mtime 1 | wc -l) == 0
Research if there was a corresponding change in openQA or package updates on 2021-03-24 or in before.
May 14 11:20:29 openqa systemd-tmpfiles: [/etc/tmpfiles.d/tmp.conf:10] Duplicate line for path "/tmp", ignoring. May 14 11:20:29 openqa systemd-tmpfiles: [/etc/tmpfiles.d/tmp.conf:11] Duplicate line for path "/var/tmp", ignoring.
There is your problem. Split tmp.conf into fs-tmp.conf and fs-var-tmp.conf (and remove the references to /tmp/systemd) and you should be set
well, you can still try to find a 80% solution within the code - but I don't think that's High priority. The High comes from the broken tempfiles config.
And if you wonder "but where did this regression come from?": https://build.suse.de/request/show/235608 was released 2 months ago. Creating fs-tmp.conf and as such disabling your tmp.conf as duplicate
I've been splitting the files and removed the
tmp.conf file completely because the remaining systemd related paths/config doesn't differ from the one under
/usr/lib/tmpfiles.d/ (provided by the
filesystem package). After restarting
systemd-tmpfiles-clean.service the log messages are indeed gone. Should I add this kind of config to salt? The old
tmp.conf wasn't in salt either. (SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/488)
I've also checked on o3 where we've got the same problem plus an override for the web UI's temp dir. I've did the splitting there as well.
I merged https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/488 . Please keep in mind #59391 where we already added tmpfile config but subsequently removed it again after a "tmpfile leak" in openQA was fixed. Not sure why we (or maybe just me) decided to remove the tmpfile handling again.
Looks like the change worked.
The other ticket is about the worker host. I can still try to avoid the tmpfiles in the code. It would be easy to cleanup at least of the tmpfile of a worker's previous job before starting a new one.
Yes, the other ticket is about the worker, so only "remotely related". If you see something that could be done about tempfiles on the central webUI instance then you can do that as part of this ticket. For anything on the worker I suggest another ticket which we do not need to care about soon.