action #174313
closed[o3][zabbix][alert] / and /var/tmp: "Disk space is low and might be full in 7d (used > 85%)" since 2024-12-11 06:50 size:S
0%
Description
Observation¶
2024-12-11 06:50:26 Warning PROBLEM ariel.dmz-prg2.suse.org /var/tmp: Disk space is low and might be full in 7d (used > 85%) 1d 9h 40m No Application: Filesystem /var/tmp
2024-12-11 06:50:23 Warning PROBLEM ariel.dmz-prg2.suse.org /: Disk space is low and might be full in 7d (used > 85%) 1d 9h 40m No Application: Filesystem /
Suggestions¶
- we're keeping a long list of old packages in /var/cache/zypp/packages/. It goes back to february 2023
- Research if zypper can provide such options, otherwise add a custom systemd service or extend openqa-auto-update to remove older cached packages based on number and/or age
- Ensure that this frees up enough space and crosscheck the alert on zabbix again
Updated by okurz 3 months ago
- Copied to action #174316: [o3][zabbix][alert] no email about zabbix alerts including storage and cpu load size:S added
Updated by okurz 3 months ago
- Related to action #40196: [monitoring] monitor internal port 9526, port 80, external port 443 accessibility of o3 and response times size:M added
Updated by gpathak 3 months ago
- Assignee set to gpathak
The /var
directory is taking up 11GiB.
gpathak@ariel:~> sudo du -ahcx /var/ | sort -hr | head
11G /var/
11G total
8.5G /var/cache
8.4G /var/cache/zypp
8.3G /var/cache/zypp/packages
7.7G /var/cache/zypp/packages/devel_openQA
5.3G /var/cache/zypp/packages/devel_openQA/x86_64
2.4G /var/cache/zypp/packages/devel_openQA/noarch
1.3G /var/log
910M /var/log/journal/06446c641307496183dfdf8dccebdceb
gpathak@ariel:~>
The /var/log
is 1.3GiB
Updated by gpathak 3 months ago
The above data and even more insights are available as charts: https://zabbix.suse.de/zabbix.php?action=charts.view&filter_hostids%5B0%5D=10923&filter_show=1&filter_set=1
Updated by tinita 3 months ago
It seems we're keeping a long list of old packages in /var/cache/zypp/packages/. It goes back to february 2023:
ls -lrth /var/cache/zypp/packages/devel_openQA/x86_64/openQA-common-*
-rw-r--r-- 1 root root 459K Feb 15 2023 /var/cache/zypp/packages/devel_openQA/x86_64/openQA-common-4.6.1676474487.945e502-lp154.5577.1.x86_64.rpm
Not sure how to configure this to a lower duration.
Updated by okurz 3 months ago
- Subject changed from [o3][zabbix][alert] / and /var/tmp: "Disk space is low and might be full in 7d (used > 85%)" since 2024-12-11 06:50 to [o3][zabbix][alert] / and /var/tmp: "Disk space is low and might be full in 7d (used > 85%)" since 2024-12-11 06:50 size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler 2 months ago · Edited
We're keeping the packages of the following repos indefinitely:
grep -iR keeppackages=1 /etc/zypp/repos.d
/etc/zypp/repos.d/devel_openQA.repo:keeppackages=1
/etc/zypp/repos.d/devel_openQA_Leap.repo:keeppackages=1
Not sure whether zypper has a way of specifying the number of packages to keep. For now I just used `find /var/cache/zypp/packages -ipath 'devel_openqa' -mtime +365 -delete´ to delete everything older than a year.
I can setup a systemd service/timer to invoke a command like that periodically. I can also set keeppackages=0
but we probably enabled this for the sake of easier downgrades. So this is probably not a good solution.
One could also add the following to openqa-auto-update
:
if [[ $OPENQA_PACKAGE_CACHE_RETENTION ]]; then
find /var/cache/zypp/packages -type f -ipath '*devel*openQA*' -mtime "+$OPENQA_PACKAGE_CACHE_RETENTION" -delete
fi
Of course this breaks if one uses a different repository name or a different packagesdir. So it is probably not the best idea to add it to the generic openqa-auto-update
script.
Updated by openqa_review 2 months ago
- Due date set to 2025-01-02
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler 2 months ago
- Status changed from In Progress to Feedback
I added, tested and enabled a simple systemd service/timer on ariel:
martchus@ariel:~> cat /etc/systemd/system/package-cleanup.service
[Unit]
Description=Cleans up old packages in zypper cache directory
[Service]
Type=oneshot
ExecStart=find /var/cache/zypp/packages -type f -ipath '*devel*openQA*' -mtime +100 -delete
martchus@ariel:~> cat /etc/systemd/system/package-cleanup.timer
[Unit]
Description=Cleans up old packages in zypper cache directory
[Timer]
OnBootSec=15min
OnUnitActiveSec=1w
[Install]
WantedBy=timers.target
This is probably simple enough to not manage this in some repository. (We do have backups of /etc
on ariel via the backup VM.)
Updated by okurz 2 months ago
- Due date changed from 2025-01-02 to 2025-01-24
- Status changed from Feedback to Workable
- Priority changed from High to Normal
mkittler wrote in #note-9:
We're keeping the packages of the following repos indefinitely:
grep -iR keeppackages=1 /etc/zypp/repos.d /etc/zypp/repos.d/devel_openQA.repo:keeppackages=1 /etc/zypp/repos.d/devel_openQA_Leap.repo:keeppackages=1
Not sure whether zypper has a way of specifying the number of packages to keep. For now I just used `find /var/cache/zypp/packages -ipath 'devel_openqa' -mtime +365 -delete´ to delete everything older than a year.
I can setup a systemd service/timer to invoke a command like that periodically. I can also set
keeppackages=0
but we probably enabled this for the sake of easier downgrades. So this is probably not a good solution.One could also add the following to
openqa-auto-update
:if [[ $OPENQA_PACKAGE_CACHE_RETENTION ]]; then find /var/cache/zypp/packages -type f -ipath '*devel*openQA*' -mtime "+$OPENQA_PACKAGE_CACHE_RETENTION" -delete fi
Of course this breaks if one uses a different repository name or a different packagesdir. So it is probably not the best idea to add it to the generic
openqa-auto-update
script.
Well, as openqa-auto-update is openQA-specific, at least in the name, but also because it calls https://github.com/os-autoinst/openQA/blob/master/script/openqa-check-devel-repo which uses devel:openQA I guess it's a good idea to cover that in the script. Also I wouldn't use the mtime, at least not alone. If for whatever reason no upgrade was conducted for 4 months and then a faulty upgrade is conducted then any older version would have been pruned. How about something like find -mtime +100 | tail -n +$OPENQA_PACKAGE_CACHE_RETENTION_KEEP_MIN
to keep at least OPENQA_PACKAGE_CACHE_RETENTION_KEEP_MIN package files (careful, that's not that many versions as we have many subpackages).
Updated by livdywan about 2 months ago
Let's block on #174316 before trying to adjust the numbers
Updated by jbaier_cz about 2 months ago
A side note, /var/tmp
looks to be actually the same filesystem as /
, it seems that zabbix wrongly detected it twice.
Updated by jbaier_cz about 2 months ago · Edited
and for the reference, it is a bug in the provided systemd unit, see https://github.com/voxpupuli/puppet-zabbix/issues/320 for more context. I adjusted the unit file to fix that issue.
Updated by tinita about 2 months ago
jbaier_cz wrote in #note-16:
and for the reference, it is a bug in the provided systemd unit, see https://github.com/voxpupuli/puppet-zabbix/issues/320 for more context. I adjusted the unit file to fix that issue.
Could you write down here the change you made? I don't really get it.
Updated by mkittler about 2 months ago
Well, as openqa-auto-update is openQA-specific, at least in the name, but also because it calls https://github.com/os-autoinst/openQA/blob/master/script/openqa-check-devel-repo which uses devel:openQA …
I looked into what we do so far as well. The problem is not that the new code is specific to openQA and our concrete packaging. The existing code is as well - which makes sense because it is part of the concrete packaging.
The problem is what I mentioned in #174313#note-9:
Of course this breaks if one uses a different repository name or a different packagesdir.
In other words, the new code is specific to the concrete local repository setup and zypper configuration. I considered making it read the relevant bits from the zypper config file but found it too involved.
I'm not sure what your code with tail …
would achieve. However, I suppose it would indeed make sense to keep a certain number of copies instead of going by time. Since I've been using openSUSE I find the lack of a tool like paccache which I'm used to from Arch Linux (and MSYS2) and does exactly what you suggested quite annoying.
Updated by mkittler about 2 months ago · Edited
@okurz What about something like this?
openqa-clean-devel-repo-cache
:
#!/bin/bash
set -e
OPENQA_PACKAGE_CACHE_RETENTION=${OPENQA_PACKAGE_CACHE_RETENTION:-100}
OPENQA_PACKAGE_CACHE_RETENTION_KEEP_MIN=${OPENQA_PACKAGE_CACHE_RETENTION_KEEP_MIN:-3}
OPENQA_PACKAGE_CACHE_PATH=${OPENQA_PACKAGE_CACHE_PATH:-/hdd/cache/zypp/packages}
OPENQA_PACKAGE_CACHE_REPO_GLOB=${OPENQA_PACKAGE_CACHE_REPO_GLOB:-'*devel*openQA*'}
IFS=$'\n'
package_files=($(find "$OPENQA_PACKAGE_CACHE_PATH" -type f -ipath "$OPENQA_PACKAGE_CACHE_REPO_GLOB" -mtime "+$OPENQA_PACKAGE_CACHE_RETENTION" | sort -rV))
previous_package_name=
package_count=0
for package_file in "${package_files[@]}"; do
package_name=$(rpm -q --qf "%{NAME}\n" "$package_file")
if [[ $package_name != "$previous_package_name" ]]; then
previous_package_name=$package_name
package_count=0
fi
package_count=$((package_count + 1))
if [[ $package_count -gt "$OPENQA_PACKAGE_CACHE_RETENTION_KEEP_MIN" ]]; then
echo "rm " "$package_file"
else
echo "keep" "$package_file"
fi
done
One could also remove the mtime
parameter completely. The use of sort -rV
should make sure that the newest packages survive. The use of rpm -q --qf "%{NAME}\n" "$package_file"
helps to decide which package files are actually the same package (but just different versions).
This script produces sane output on my local system (also when removing the mtime
parameter).
We still have to decide where to this script. I suppose we could add it to openqa-auto-update
with all the specifics put into variables. I would make it so it doesn't run by default. We could however specify common defaults for OPENQA_PACKAGE_CACHE_PATH
and OPENQA_PACKAGE_CACHE_RETENTION_KEEP_MIN
.
Updated by okurz about 2 months ago
mkittler wrote in #note-20:
@okurz What about something like this?
LGTM
One could also remove the
mtime
parameter completely.
I would keep the mtime as a safety measure.
We still have to decide where to this script. I suppose we could add it to
openqa-auto-update
with all the specifics put into variables. I would make it so it doesn't run by default. We could however specify common defaults forOPENQA_PACKAGE_CACHE_PATH
andOPENQA_PACKAGE_CACHE_RETENTION_KEEP_MIN
.
yes, all that sounds good.
Updated by jbaier_cz about 2 months ago
tinita wrote in #note-18:
jbaier_cz wrote in #note-16:
and for the reference, it is a bug in the provided systemd unit, see https://github.com/voxpupuli/puppet-zabbix/issues/320 for more context. I adjusted the unit file to fix that issue.
Could you write down here the change you made? I don't really get it.
Sure, see systemctl cat zabbix_agentd.service
, I just added a following snippet as recommended in the linked issue:
# /etc/systemd/system/zabbix_agentd.service.d/override.conf
[Service]
PrivateTmp=no
Updated by mkittler about 2 months ago
- Status changed from Workable to Feedback
PR: https://github.com/os-autoinst/openQA/pull/6104
I have also disabled the timer I previously configured again.
Updated by mkittler about 2 months ago
- Status changed from Feedback to Resolved
The PR has been merged and deployed on o3. I also enabled the cleanup there. Currently there's not much to see because there's nothing to be cleaned up. That is expected because the find -mtime +100 …
service/timer was still enabled before and we also keep up to 10 versions of each package. I did a dry run of the script with different parameters to see how it behaves on ariel and it seems to work.
Updated by okurz about 2 months ago
- Due date deleted (
2025-01-24) - Status changed from Resolved to Workable
reopening, see #175464-12
Updated by mkittler about 2 months ago
- Status changed from Workable to Rejected
There are plenty of cached packages on ariel, e.g. the command you mentioned (find /var/cache/zypp/ | less
) returns many results. If this is about workers (where the cache is indeed empty) then this is completely unrelated because my change to auto-update is not enabled by default and was only enabled on ariel. (If someone enabled it meanwhile elsewhere that is not a reason to reopen this ticket.)
Updated by mkittler about 2 months ago
I also just had a look at one of the repo config files on openqaworker23 (/etc/zypp/repos.d/devel_openQA.repo
) and I don't see that keeppackages=1
is configured. So a clean cache directory is supposedly expected on that machines (and probably others).
Updated by okurz about 2 months ago
- Status changed from Rejected to Resolved
Alright, seems like we never had it on workers then
Updated by livdywan about 1 month ago
- Related to action #176145: Preserve package cache on worker hosts added