tickets #134960
closedmonitor.i.o.o logrotate failures due to low free disk space left
100%
Description
A couple days ago monitor.i.o.o started bleeding on Icinga's monitor about low free disk space left, and now logrotate
joined the club and started to fail too because of that.
Files
Updated by luc14n0 9 months ago
- Private changed from Yes to No
According to du
, those are the most beefy directories under /var
:
362G /var
148G /var/lib
131G /var/backup
131G /var/backup/influxdb
81G /var/log
69G /var/lib/influxdb
69G /var/lib/influxdb/data
69G /var/lib/influxdb/data/icinga2
69G /var/lib/influxdb/data/icinga2/autogen
60G /var/lib/prometheus
60G /var/lib/prometheus/metrics
38G /var/backup/influxdb/20230827
38G /var/backup/influxdb/20230828
38G /var/log/icinga2
38G /var/log/icinga2/compat
38G /var/log/icinga2/compat/archives
35G /var/log/opensuse
35G /var/log/opensuse/hosts
31G /var/log/opensuse/hosts/OLD
29G /var/backup/influxdb/20230831
16G /var/backup/influxdb/20230830
13G /var/lib/elasticsearch
13G /var/lib/elasticsearch/nodes
13G /var/lib/elasticsearch/nodes/0
13G /var/lib/elasticsearch/nodes/0/indices
9.0G /var/backup/influxdb/20230829
6.3G /var/log/opensuse/hosts/OLD/2019
4.8G /var/lib/pnp4nagios
4.8G /var/lib/pnp4nagios/perfdata
3.4G /var/cache
3.3G /var/lib/prometheus/metrics/01H8MWF9MAJ2TE10DCT6TP9BX7
3.2G /var/cache/icinga2
3.2G /var/lib/prometheus/metrics/01H8D59CX3MVM9ZPX7VMVZKBBN
3.1G /var/lib/prometheus/metrics/01H83G9QQZSKYPPEVR5H9V7RRT
3.1G /var/lib/prometheus/metrics/01H85E321S6AVNMQE440A9QJWA
3.1G /var/lib/prometheus/metrics/01H87BWRQWA4T0X6YDGHQT67RD
3.1G /var/lib/prometheus/metrics/01H899P976WH0A1YKF1RMEMPNT
3.1G /var/lib/prometheus/metrics/01H8B7FV9YPR56ERHJP2WV8FKB
3.1G /var/lib/prometheus/metrics/01H8MWF9MAJ2TE10DCT6TP9BX7/chunks
3.0G /var/lib/prometheus/metrics/01H85E321S6AVNMQE440A9QJWA/chunks
3.0G /var/lib/prometheus/metrics/01H87BWRQWA4T0X6YDGHQT67RD/chunks
3.0G /var/lib/prometheus/metrics/01H8B7FV9YPR56ERHJP2WV8FKB/chunks
3.0G /var/lib/prometheus/metrics/01H8D59CX3MVM9ZPX7VMVZKBBN/chunks
2.9G /var/lib/prometheus/metrics/01H81JG0BW9CDZY83JGGHY5D19
2.9G /var/lib/prometheus/metrics/01H83G9QQZSKYPPEVR5H9V7RRT/chunks
2.9G /var/lib/prometheus/metrics/01H899P976WH0A1YKF1RMEMPNT/chunks
2.9G /var/lib/prometheus/metrics/01H8F32QWW467WSS7YS2TADVKH
2.9G /var/lib/prometheus/metrics/01H968N3GENWQ97QM5MZ6NFX90
2.8G /var/backup/influxdb/20230826
2.8G /var/lib/prometheus/metrics/01H8F32QWW467WSS7YS2TADVKH/chunks
2.8G /var/lib/prometheus/metrics/01H8PT8S9AZSESFBPWJAEP642P
2.8G /var/lib/prometheus/metrics/01H8RR2ATJBXZDQH0WHDG39FDJ
2.8G /var/lib/prometheus/metrics/01H8TNVT2RKZVN1G41TDBA3RPF
2.8G /var/lib/prometheus/metrics/01H8WKNAYQPK9HDDRYRCBG8MKA
2.8G /var/lib/prometheus/metrics/01H8YHEYVDVMFA5CK57973S6X0
2.8G /var/lib/prometheus/metrics/01H90F8E9QPX7DP4TN9P9775S7
2.8G /var/lib/prometheus/metrics/01H92D22WB92WEE0GYVQ72GWBR
2.8G /var/lib/prometheus/metrics/01H94AVJXMWNVR9DTFNCYKBPT3
2.7G /var/lib/prometheus/metrics/01H81JG0BW9CDZY83JGGHY5D19/chunks
2.7G /var/lib/prometheus/metrics/01H968N3GENWQ97QM5MZ6NFX90/chunks
2.6G /var/lib/prometheus/metrics/01H8H0W9YSFK49GXCFGQNZ9NAD
2.6G /var/lib/prometheus/metrics/01H8PT8S9AZSESFBPWJAEP642P/chunks
2.6G /var/lib/prometheus/metrics/01H8RR2ATJBXZDQH0WHDG39FDJ/chunks
2.6G /var/lib/prometheus/metrics/01H8TNVT2RKZVN1G41TDBA3RPF/chunks
2.6G /var/lib/prometheus/metrics/01H8WKNAYQPK9HDDRYRCBG8MKA/chunks
2.6G /var/lib/prometheus/metrics/01H8YHEYVDVMFA5CK57973S6X0/chunks
2.6G /var/lib/prometheus/metrics/01H90F8E9QPX7DP4TN9P9775S7/chunks
2.6G /var/lib/prometheus/metrics/01H92D22WB92WEE0GYVQ72GWBR/chunks
2.6G /var/lib/prometheus/metrics/01H94AVJXMWNVR9DTFNCYKBPT3/chunks
2.4G /var/lib/prometheus/metrics/01H8H0W9YSFK49GXCFGQNZ9NAD/chunks
2.4G /var/lib/prometheus/metrics/01H8JYP075VB7N2P2FTC38RMX0
2.2G /var/lib/prometheus/metrics/01H8JYP075VB7N2P2FTC38RMX0/chunks
1.8G /var/log/icinga
1.8G /var/log/icinga/archives
1.8G /var/log/opensuse/hosts/OLD/2020
1.7G /var/lib/mysql
1.6G /var/lib/elasticsearch/nodes/0/indices/Yi8jXZsnRQy2iqBfUzaUcg
1.5G /var/lib/mysql/icinga
1.1G /var/log/journal
1.1G /var/log/journal/e9dce331a4f2602b6b67b145585d2b7f
From the contents of /root/bin
, it seems we're keeping those backups locally, is that so? How can we improve the situation there, using backup.i.o.o?
Updated by luc14n0 9 months ago
- Checklist item Double check anything under /var/log is being rotated added
- Status changed from New to Workable
- Assignee set to luc14n0
- % Done changed from 0 to 80
OK. After a brief exchange with Georg on IRC, I took the liberty of deleting:
31G /var/log/opensuse/hosts/OLD
There's not much point in keeping those old logs (oldest were from 2019) if no analyses are being made with them, so they are just "rubbish", gathering dust.
Another rubbish that's gathering a lot of dust is:
38G /var/log/icinga2/compat
38G /var/log/icinga2/compat/archives
Those are Icinga1 compatible logs. I bade farewell to them too. Plus I disabled them altogether with:
icinga2 feature disable compatlog
Something that might be worth to mention is that systemctl status icinga2
shows:
Sep 01 23:39:20 monitor check_nrpe[23706]: Remote $our_freeipa_ip does not support version 3/4 packets
Sep 01 23:39:20 monitor check_nrpe[23706]: Remote $our_freeipa_ip accepted a version 2 packet
This got us ~70G of free space back for /var
.
Updated by luc14n0 9 months ago
- Checklist item Double check anything under /var/log is being rotated set to Done
- Status changed from Workable to Resolved
- % Done changed from 80 to 100
Alright, found another dust gatherer:
1.8G /var/log/icinga/archives
That directory contains old logs dating back from 2017 to 2020. Sayounara to them. We should be good, for now.
P.S.:
The whole /var/log/icinga
were from legacy Icinga 1, so I deleted them.
And I manually compressed logs under /var/log
that were uncompressed due to logrotate failures in the past.
Updated by luc14n0 9 months ago
- Status changed from Resolved to Feedback
OK. After running systemctl status logrotate
there were many:
Sep 02 00:51:33 monitor logrotate[3039]: error: error accessing /var/log/opensuse/hosts/OLD/: No such file or directory
How careless of me!
I created the /var/log/opensuse/hosts/OLD
directory back. And /etc/logrotate.d/rsyslog-remote
is the conf rotating those hosts logs, with the following postrotate script:
postrotate
STARTDIR="/var/log/opensuse/hosts"
LOGS=$(echo $STARTDIR/OLD/*.log.*)
for f in $LOGS ; do
if [ -e "$f" ] ; then
YEAR=$(date +"%Y")
TARGETDIR="$STARTDIR/OLD/$YEAR"
test -d "$TARGETDIR" || mkdir -p "$TARGETDIR"
/usr/bin/xz "$f"
mv "$f.xz" "$TARGETDIR"/
fi
done
/usr/bin/systemctl reload rsyslog
endscript
It uses maxage 365
, but it doesn't seem to be cleaning up old stuff, as there were logs dating back to 2019. We could improve that script to make sure anything older than a year ago gets cleaned up. Any thoughts?
Updated by crameleon 9 months ago
Hey @luc14n0, thanks for taking care of this. I don't quite understand the reason for this postrotate script, logrotate is perfectly able to compress logs itself. The sorting into subdirectories by year only makes sense if there is a reason to keep year old logs in the first place.
Updated by luc14n0 9 months ago
crameleon wrote in #note-12:
Hey @luc14n0, thanks for taking care of this. I don't quite understand the reason for this postrotate script, logrotate is perfectly able to compress logs itself. The sorting into subdirectories by year only makes sense if there is a reason to keep year old logs in the first place.
Yeah. No problem.
The comment in the conf file says:
# compress doesn't work here as it tries to rotate the
# logfile in OLD/ which is moved away already in
# postrotate
Updated by pjessen_invalid 9 months ago
crameleon wrote in #note-13:
I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.
Wrt archiving, backup.i.o.o is already used for archiving some logfiles, e.g. for /var/log/messages and /var/log/mail from pontifex. See #127766.
Updated by luc14n0 9 months ago
pjessen_invalid wrote in #note-16:
crameleon wrote in #note-13:
I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.
Wrt archiving, backup.i.o.o is already used for archiving some logfiles, e.g. for /var/log/messages and /var/log/mail from pontifex. See #127766.
That looks interesting. I'm gonna take a look at that.