tickets #134960
closedmonitor.i.o.o logrotate failures due to low free disk space left
100%
Description
A couple days ago monitor.i.o.o started bleeding on Icinga's monitor about low free disk space left, and now logrotate
joined the club and started to fail too because of that.
Files
Updated by luc14n0 over 1 year ago
- Private changed from Yes to No
According to du
, those are the most beefy directories under /var
:
362G /var
148G /var/lib
131G /var/backup
131G /var/backup/influxdb
81G /var/log
69G /var/lib/influxdb
69G /var/lib/influxdb/data
69G /var/lib/influxdb/data/icinga2
69G /var/lib/influxdb/data/icinga2/autogen
60G /var/lib/prometheus
60G /var/lib/prometheus/metrics
38G /var/backup/influxdb/20230827
38G /var/backup/influxdb/20230828
38G /var/log/icinga2
38G /var/log/icinga2/compat
38G /var/log/icinga2/compat/archives
35G /var/log/opensuse
35G /var/log/opensuse/hosts
31G /var/log/opensuse/hosts/OLD
29G /var/backup/influxdb/20230831
16G /var/backup/influxdb/20230830
13G /var/lib/elasticsearch
13G /var/lib/elasticsearch/nodes
13G /var/lib/elasticsearch/nodes/0
13G /var/lib/elasticsearch/nodes/0/indices
9.0G /var/backup/influxdb/20230829
6.3G /var/log/opensuse/hosts/OLD/2019
4.8G /var/lib/pnp4nagios
4.8G /var/lib/pnp4nagios/perfdata
3.4G /var/cache
3.3G /var/lib/prometheus/metrics/01H8MWF9MAJ2TE10DCT6TP9BX7
3.2G /var/cache/icinga2
3.2G /var/lib/prometheus/metrics/01H8D59CX3MVM9ZPX7VMVZKBBN
3.1G /var/lib/prometheus/metrics/01H83G9QQZSKYPPEVR5H9V7RRT
3.1G /var/lib/prometheus/metrics/01H85E321S6AVNMQE440A9QJWA
3.1G /var/lib/prometheus/metrics/01H87BWRQWA4T0X6YDGHQT67RD
3.1G /var/lib/prometheus/metrics/01H899P976WH0A1YKF1RMEMPNT
3.1G /var/lib/prometheus/metrics/01H8B7FV9YPR56ERHJP2WV8FKB
3.1G /var/lib/prometheus/metrics/01H8MWF9MAJ2TE10DCT6TP9BX7/chunks
3.0G /var/lib/prometheus/metrics/01H85E321S6AVNMQE440A9QJWA/chunks
3.0G /var/lib/prometheus/metrics/01H87BWRQWA4T0X6YDGHQT67RD/chunks
3.0G /var/lib/prometheus/metrics/01H8B7FV9YPR56ERHJP2WV8FKB/chunks
3.0G /var/lib/prometheus/metrics/01H8D59CX3MVM9ZPX7VMVZKBBN/chunks
2.9G /var/lib/prometheus/metrics/01H81JG0BW9CDZY83JGGHY5D19
2.9G /var/lib/prometheus/metrics/01H83G9QQZSKYPPEVR5H9V7RRT/chunks
2.9G /var/lib/prometheus/metrics/01H899P976WH0A1YKF1RMEMPNT/chunks
2.9G /var/lib/prometheus/metrics/01H8F32QWW467WSS7YS2TADVKH
2.9G /var/lib/prometheus/metrics/01H968N3GENWQ97QM5MZ6NFX90
2.8G /var/backup/influxdb/20230826
2.8G /var/lib/prometheus/metrics/01H8F32QWW467WSS7YS2TADVKH/chunks
2.8G /var/lib/prometheus/metrics/01H8PT8S9AZSESFBPWJAEP642P
2.8G /var/lib/prometheus/metrics/01H8RR2ATJBXZDQH0WHDG39FDJ
2.8G /var/lib/prometheus/metrics/01H8TNVT2RKZVN1G41TDBA3RPF
2.8G /var/lib/prometheus/metrics/01H8WKNAYQPK9HDDRYRCBG8MKA
2.8G /var/lib/prometheus/metrics/01H8YHEYVDVMFA5CK57973S6X0
2.8G /var/lib/prometheus/metrics/01H90F8E9QPX7DP4TN9P9775S7
2.8G /var/lib/prometheus/metrics/01H92D22WB92WEE0GYVQ72GWBR
2.8G /var/lib/prometheus/metrics/01H94AVJXMWNVR9DTFNCYKBPT3
2.7G /var/lib/prometheus/metrics/01H81JG0BW9CDZY83JGGHY5D19/chunks
2.7G /var/lib/prometheus/metrics/01H968N3GENWQ97QM5MZ6NFX90/chunks
2.6G /var/lib/prometheus/metrics/01H8H0W9YSFK49GXCFGQNZ9NAD
2.6G /var/lib/prometheus/metrics/01H8PT8S9AZSESFBPWJAEP642P/chunks
2.6G /var/lib/prometheus/metrics/01H8RR2ATJBXZDQH0WHDG39FDJ/chunks
2.6G /var/lib/prometheus/metrics/01H8TNVT2RKZVN1G41TDBA3RPF/chunks
2.6G /var/lib/prometheus/metrics/01H8WKNAYQPK9HDDRYRCBG8MKA/chunks
2.6G /var/lib/prometheus/metrics/01H8YHEYVDVMFA5CK57973S6X0/chunks
2.6G /var/lib/prometheus/metrics/01H90F8E9QPX7DP4TN9P9775S7/chunks
2.6G /var/lib/prometheus/metrics/01H92D22WB92WEE0GYVQ72GWBR/chunks
2.6G /var/lib/prometheus/metrics/01H94AVJXMWNVR9DTFNCYKBPT3/chunks
2.4G /var/lib/prometheus/metrics/01H8H0W9YSFK49GXCFGQNZ9NAD/chunks
2.4G /var/lib/prometheus/metrics/01H8JYP075VB7N2P2FTC38RMX0
2.2G /var/lib/prometheus/metrics/01H8JYP075VB7N2P2FTC38RMX0/chunks
1.8G /var/log/icinga
1.8G /var/log/icinga/archives
1.8G /var/log/opensuse/hosts/OLD/2020
1.7G /var/lib/mysql
1.6G /var/lib/elasticsearch/nodes/0/indices/Yi8jXZsnRQy2iqBfUzaUcg
1.5G /var/lib/mysql/icinga
1.1G /var/log/journal
1.1G /var/log/journal/e9dce331a4f2602b6b67b145585d2b7f
From the contents of /root/bin
, it seems we're keeping those backups locally, is that so? How can we improve the situation there, using backup.i.o.o?
Updated by luc14n0 over 1 year ago
- File deleted (
clipboard-202308312141-csa4x.png)
Updated by luc14n0 over 1 year ago
- Status changed from Workable to New
From the contents of /root/bin, it seems we're keeping those backups locally, is that so? How can we improve the situation there, using backup.i.o.o?
I might have had a wrong assumption about mybackup.i.o.o.
Updated by crameleon over 1 year ago
backup.i.o.o is a good target for longer term storage of backups. mybackup.i.o.o confused me in the past as well, I think my
refers to MySQL
.
Updated by luc14n0 over 1 year ago
Taking older logs from previous years and making a tarball out of them to further compress data, would it be a quick strategy to gain some space for now?
Updated by crameleon over 1 year ago
It would, but I think there is no point in keeping old log files if they're not used for any statistics gathering.
Updated by luc14n0 over 1 year ago
That's a good point. I'm going to show the exit door for those old stuff then.
Updated by luc14n0 over 1 year ago
- Checklist item Double check anything under /var/log is being rotated added
- Status changed from New to Workable
- Assignee set to luc14n0
- % Done changed from 0 to 80
OK. After a brief exchange with Georg on IRC, I took the liberty of deleting:
31G /var/log/opensuse/hosts/OLD
There's not much point in keeping those old logs (oldest were from 2019) if no analyses are being made with them, so they are just "rubbish", gathering dust.
Another rubbish that's gathering a lot of dust is:
38G /var/log/icinga2/compat
38G /var/log/icinga2/compat/archives
Those are Icinga1 compatible logs. I bade farewell to them too. Plus I disabled them altogether with:
icinga2 feature disable compatlog
Something that might be worth to mention is that systemctl status icinga2
shows:
Sep 01 23:39:20 monitor check_nrpe[23706]: Remote $our_freeipa_ip does not support version 3/4 packets
Sep 01 23:39:20 monitor check_nrpe[23706]: Remote $our_freeipa_ip accepted a version 2 packet
This got us ~70G of free space back for /var
.
Updated by luc14n0 over 1 year ago
- Checklist item Double check anything under /var/log is being rotated set to Done
- Status changed from Workable to Resolved
- % Done changed from 80 to 100
Alright, found another dust gatherer:
1.8G /var/log/icinga/archives
That directory contains old logs dating back from 2017 to 2020. Sayounara to them. We should be good, for now.
P.S.:
The whole /var/log/icinga
were from legacy Icinga 1, so I deleted them.
And I manually compressed logs under /var/log
that were uncompressed due to logrotate failures in the past.
Updated by luc14n0 over 1 year ago
- Status changed from Resolved to Feedback
OK. After running systemctl status logrotate
there were many:
Sep 02 00:51:33 monitor logrotate[3039]: error: error accessing /var/log/opensuse/hosts/OLD/: No such file or directory
How careless of me!
I created the /var/log/opensuse/hosts/OLD
directory back. And /etc/logrotate.d/rsyslog-remote
is the conf rotating those hosts logs, with the following postrotate script:
postrotate
STARTDIR="/var/log/opensuse/hosts"
LOGS=$(echo $STARTDIR/OLD/*.log.*)
for f in $LOGS ; do
if [ -e "$f" ] ; then
YEAR=$(date +"%Y")
TARGETDIR="$STARTDIR/OLD/$YEAR"
test -d "$TARGETDIR" || mkdir -p "$TARGETDIR"
/usr/bin/xz "$f"
mv "$f.xz" "$TARGETDIR"/
fi
done
/usr/bin/systemctl reload rsyslog
endscript
It uses maxage 365
, but it doesn't seem to be cleaning up old stuff, as there were logs dating back to 2019. We could improve that script to make sure anything older than a year ago gets cleaned up. Any thoughts?
Updated by crameleon over 1 year ago
Hey @luc14n0, thanks for taking care of this. I don't quite understand the reason for this postrotate script, logrotate is perfectly able to compress logs itself. The sorting into subdirectories by year only makes sense if there is a reason to keep year old logs in the first place.
Updated by crameleon over 1 year ago
I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.
Updated by luc14n0 over 1 year ago
crameleon wrote in #note-12:
Hey @luc14n0, thanks for taking care of this. I don't quite understand the reason for this postrotate script, logrotate is perfectly able to compress logs itself. The sorting into subdirectories by year only makes sense if there is a reason to keep year old logs in the first place.
Yeah. No problem.
The comment in the conf file says:
# compress doesn't work here as it tries to rotate the
# logfile in OLD/ which is moved away already in
# postrotate
Updated by luc14n0 over 1 year ago
crameleon wrote in #note-13:
I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.
No objections from my side.
Updated by pjessen_invalid over 1 year ago
crameleon wrote in #note-13:
I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.
Wrt archiving, backup.i.o.o is already used for archiving some logfiles, e.g. for /var/log/messages and /var/log/mail from pontifex. See #127766.
Updated by luc14n0 over 1 year ago
pjessen_invalid wrote in #note-16:
crameleon wrote in #note-13:
I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.
Wrt archiving, backup.i.o.o is already used for archiving some logfiles, e.g. for /var/log/messages and /var/log/mail from pontifex. See #127766.
That looks interesting. I'm gonna take a look at that.