action #92338
closed[Alerting] File systems alert, / on osd
100%
Description
Observation¶
[Alerting] File systems alert
One of the file systems is too full
Metric name
Value
/: Used Percentage
90.049
see https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=74&orgId=1
output of du -x --max-depth=1 -BM | sort -n
1M ./lost+found
1M ./mnt
1M ./selinux
1M ./storage
2M ./bin
6M ./sbin
12M ./lib64
26M ./etc
46M ./root
99M ./boot
147M ./opt
1085M ./lib
1284M ./var
4128M ./usr
10083M ./tmp
16915M .
seems like /tmp has a very big contribution now. A lot of temporary directories like 6TyfduRNJ6
, oldest one since 2021-03-24 03:48 . Unfortunately there are hardly any logs going back in time, like because / is that full that also the systemd journal does not save more. 2021-03-24 is not a date where we commonly automatically reboot the system so not sure if non-openQA package upgrades caused a change.
I found some files like tmp.vvDweS5srZ which look like autoinst-log.txt or worker-log.txt . I assume that these are temporary files from openqa-investigate
Updated by okurz over 3 years ago
- Assignee set to okurz
- Priority changed from Urgent to High
I deleted some directories and files on osd. Likely a similar problem can exist on osd. Maybe https://github.com/os-autoinst/scripts/blob/master/openqa-label-known-issues#L134 combined with an unexpected exit of the script could be the problem. the tempfile is deleted but only if the function exits successfully. I assume we should ensure deleting that file in an EXIT
handler
Updated by okurz over 3 years ago
- Status changed from New to Feedback
Updated by okurz over 3 years ago
- Status changed from Feedback to Blocked
I will track https://github.com/os-autoinst/scripts/pull/72 in #92341 after I found that a directory like /tmp/FOWvYnWzKt from 2021-03-24 05:45, the first non-empty directory, has a content:
1616561148_719579.png autoinst-log-live.txt last.png serial-terminal-live.txt
this looks more like a regression in openQA or some dependency. Created #92344
And deleted many temporary files and directories on osd so that we are back to 38% usage now, 12G available.
Updated by okurz over 3 years ago
- Status changed from Blocked to Feedback
Updated by okurz over 3 years ago
- Status changed from Feedback to Resolved
MR merged and effective since some days. state on osd in /tmp looks fine