Project

General

Profile

Actions

tickets #134960

closed

monitor.i.o.o logrotate failures due to low free disk space left

Added by luc14n0 9 months ago. Updated 20 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2023-08-31
Due date:
% Done:

100%

Estimated time:

Description

A couple days ago monitor.i.o.o started bleeding on Icinga's monitor about low free disk space left, and now logrotate joined the club and started to fail too because of that.


Files

Screenshot from 2023-08-31 21-40-14.png (49.6 KB) Screenshot from 2023-08-31 21-40-14.png Icinga monitor luc14n0, 2023-09-01 00:40

Checklist

  • Double check anything under /var/log is being rotated
Actions #1

Updated by luc14n0 9 months ago

  • Private changed from Yes to No

According to du, those are the most beefy directories under /var:

362G    /var
148G    /var/lib
131G    /var/backup
131G    /var/backup/influxdb
81G     /var/log
69G     /var/lib/influxdb
69G     /var/lib/influxdb/data
69G     /var/lib/influxdb/data/icinga2
69G     /var/lib/influxdb/data/icinga2/autogen
60G     /var/lib/prometheus
60G     /var/lib/prometheus/metrics
38G     /var/backup/influxdb/20230827
38G     /var/backup/influxdb/20230828
38G     /var/log/icinga2
38G     /var/log/icinga2/compat
38G     /var/log/icinga2/compat/archives
35G     /var/log/opensuse
35G     /var/log/opensuse/hosts
31G     /var/log/opensuse/hosts/OLD
29G     /var/backup/influxdb/20230831
16G     /var/backup/influxdb/20230830
13G     /var/lib/elasticsearch
13G     /var/lib/elasticsearch/nodes
13G     /var/lib/elasticsearch/nodes/0
13G     /var/lib/elasticsearch/nodes/0/indices
9.0G    /var/backup/influxdb/20230829
6.3G    /var/log/opensuse/hosts/OLD/2019
4.8G    /var/lib/pnp4nagios
4.8G    /var/lib/pnp4nagios/perfdata
3.4G    /var/cache
3.3G    /var/lib/prometheus/metrics/01H8MWF9MAJ2TE10DCT6TP9BX7
3.2G    /var/cache/icinga2
3.2G    /var/lib/prometheus/metrics/01H8D59CX3MVM9ZPX7VMVZKBBN
3.1G    /var/lib/prometheus/metrics/01H83G9QQZSKYPPEVR5H9V7RRT
3.1G    /var/lib/prometheus/metrics/01H85E321S6AVNMQE440A9QJWA
3.1G    /var/lib/prometheus/metrics/01H87BWRQWA4T0X6YDGHQT67RD
3.1G    /var/lib/prometheus/metrics/01H899P976WH0A1YKF1RMEMPNT
3.1G    /var/lib/prometheus/metrics/01H8B7FV9YPR56ERHJP2WV8FKB
3.1G    /var/lib/prometheus/metrics/01H8MWF9MAJ2TE10DCT6TP9BX7/chunks
3.0G    /var/lib/prometheus/metrics/01H85E321S6AVNMQE440A9QJWA/chunks
3.0G    /var/lib/prometheus/metrics/01H87BWRQWA4T0X6YDGHQT67RD/chunks
3.0G    /var/lib/prometheus/metrics/01H8B7FV9YPR56ERHJP2WV8FKB/chunks
3.0G    /var/lib/prometheus/metrics/01H8D59CX3MVM9ZPX7VMVZKBBN/chunks
2.9G    /var/lib/prometheus/metrics/01H81JG0BW9CDZY83JGGHY5D19
2.9G    /var/lib/prometheus/metrics/01H83G9QQZSKYPPEVR5H9V7RRT/chunks
2.9G    /var/lib/prometheus/metrics/01H899P976WH0A1YKF1RMEMPNT/chunks
2.9G    /var/lib/prometheus/metrics/01H8F32QWW467WSS7YS2TADVKH
2.9G    /var/lib/prometheus/metrics/01H968N3GENWQ97QM5MZ6NFX90
2.8G    /var/backup/influxdb/20230826
2.8G    /var/lib/prometheus/metrics/01H8F32QWW467WSS7YS2TADVKH/chunks
2.8G    /var/lib/prometheus/metrics/01H8PT8S9AZSESFBPWJAEP642P
2.8G    /var/lib/prometheus/metrics/01H8RR2ATJBXZDQH0WHDG39FDJ
2.8G    /var/lib/prometheus/metrics/01H8TNVT2RKZVN1G41TDBA3RPF
2.8G    /var/lib/prometheus/metrics/01H8WKNAYQPK9HDDRYRCBG8MKA
2.8G    /var/lib/prometheus/metrics/01H8YHEYVDVMFA5CK57973S6X0
2.8G    /var/lib/prometheus/metrics/01H90F8E9QPX7DP4TN9P9775S7
2.8G    /var/lib/prometheus/metrics/01H92D22WB92WEE0GYVQ72GWBR
2.8G    /var/lib/prometheus/metrics/01H94AVJXMWNVR9DTFNCYKBPT3
2.7G    /var/lib/prometheus/metrics/01H81JG0BW9CDZY83JGGHY5D19/chunks
2.7G    /var/lib/prometheus/metrics/01H968N3GENWQ97QM5MZ6NFX90/chunks
2.6G    /var/lib/prometheus/metrics/01H8H0W9YSFK49GXCFGQNZ9NAD
2.6G    /var/lib/prometheus/metrics/01H8PT8S9AZSESFBPWJAEP642P/chunks
2.6G    /var/lib/prometheus/metrics/01H8RR2ATJBXZDQH0WHDG39FDJ/chunks
2.6G    /var/lib/prometheus/metrics/01H8TNVT2RKZVN1G41TDBA3RPF/chunks
2.6G    /var/lib/prometheus/metrics/01H8WKNAYQPK9HDDRYRCBG8MKA/chunks
2.6G    /var/lib/prometheus/metrics/01H8YHEYVDVMFA5CK57973S6X0/chunks
2.6G    /var/lib/prometheus/metrics/01H90F8E9QPX7DP4TN9P9775S7/chunks
2.6G    /var/lib/prometheus/metrics/01H92D22WB92WEE0GYVQ72GWBR/chunks
2.6G    /var/lib/prometheus/metrics/01H94AVJXMWNVR9DTFNCYKBPT3/chunks
2.4G    /var/lib/prometheus/metrics/01H8H0W9YSFK49GXCFGQNZ9NAD/chunks
2.4G    /var/lib/prometheus/metrics/01H8JYP075VB7N2P2FTC38RMX0
2.2G    /var/lib/prometheus/metrics/01H8JYP075VB7N2P2FTC38RMX0/chunks
1.8G    /var/log/icinga
1.8G    /var/log/icinga/archives
1.8G    /var/log/opensuse/hosts/OLD/2020
1.7G    /var/lib/mysql
1.6G    /var/lib/elasticsearch/nodes/0/indices/Yi8jXZsnRQy2iqBfUzaUcg
1.5G    /var/lib/mysql/icinga
1.1G    /var/log/journal
1.1G    /var/log/journal/e9dce331a4f2602b6b67b145585d2b7f

From the contents of /root/bin, it seems we're keeping those backups locally, is that so? How can we improve the situation there, using backup.i.o.o?

Actions #2

Updated by luc14n0 9 months ago

  • File deleted (clipboard-202308312141-csa4x.png)
Actions #3

Updated by luc14n0 9 months ago

  • Status changed from Workable to New

From the contents of /root/bin, it seems we're keeping those backups locally, is that so? How can we improve the situation there, using backup.i.o.o?

I might have had a wrong assumption about mybackup.i.o.o.

Actions #4

Updated by crameleon 9 months ago

backup.i.o.o is a good target for longer term storage of backups. mybackup.i.o.o confused me in the past as well, I think my refers to MySQL.

Actions #5

Updated by crameleon 9 months ago

81G /var/log

That shouldn't happen.

Actions #6

Updated by luc14n0 9 months ago

Taking older logs from previous years and making a tarball out of them to further compress data, would it be a quick strategy to gain some space for now?

Actions #7

Updated by crameleon 9 months ago

It would, but I think there is no point in keeping old log files if they're not used for any statistics gathering.

Actions #8

Updated by luc14n0 9 months ago

That's a good point. I'm going to show the exit door for those old stuff then.

Actions #9

Updated by luc14n0 9 months ago

  • Checklist item Double check anything under /var/log is being rotated added
  • Status changed from New to Workable
  • Assignee set to luc14n0
  • % Done changed from 0 to 80

OK. After a brief exchange with Georg on IRC, I took the liberty of deleting:

31G     /var/log/opensuse/hosts/OLD

There's not much point in keeping those old logs (oldest were from 2019) if no analyses are being made with them, so they are just "rubbish", gathering dust.

Another rubbish that's gathering a lot of dust is:

38G     /var/log/icinga2/compat
38G     /var/log/icinga2/compat/archives

Those are Icinga1 compatible logs. I bade farewell to them too. Plus I disabled them altogether with:

icinga2 feature disable compatlog

Something that might be worth to mention is that systemctl status icinga2 shows:

Sep 01 23:39:20 monitor check_nrpe[23706]: Remote $our_freeipa_ip does not support version 3/4 packets
Sep 01 23:39:20 monitor check_nrpe[23706]: Remote $our_freeipa_ip accepted a version 2 packet

This got us ~70G of free space back for /var.

Actions #10

Updated by luc14n0 9 months ago

  • Checklist item Double check anything under /var/log is being rotated set to Done
  • Status changed from Workable to Resolved
  • % Done changed from 80 to 100

Alright, found another dust gatherer:

1.8G    /var/log/icinga/archives

That directory contains old logs dating back from 2017 to 2020. Sayounara to them. We should be good, for now.

P.S.:

The whole /var/log/icinga were from legacy Icinga 1, so I deleted them.

And I manually compressed logs under /var/log that were uncompressed due to logrotate failures in the past.

Actions #11

Updated by luc14n0 9 months ago

  • Status changed from Resolved to Feedback

OK. After running systemctl status logrotate there were many:

Sep 02 00:51:33 monitor logrotate[3039]: error: error accessing /var/log/opensuse/hosts/OLD/: No such file or directory

How careless of me!

I created the /var/log/opensuse/hosts/OLD directory back. And /etc/logrotate.d/rsyslog-remote is the conf rotating those hosts logs, with the following postrotate script:

    postrotate                                                                                                            
        STARTDIR="/var/log/opensuse/hosts"                                                                                
        LOGS=$(echo $STARTDIR/OLD/*.log.*)                                                                                
        for f in $LOGS ; do                                                                                               
          if [ -e "$f" ] ; then                                                                                           
              YEAR=$(date +"%Y")                                                                                          
              TARGETDIR="$STARTDIR/OLD/$YEAR"                                                                             
              test -d "$TARGETDIR" || mkdir -p "$TARGETDIR"                                                               
              /usr/bin/xz "$f"                                                                                            
              mv "$f.xz" "$TARGETDIR"/                                                                                    
          fi                                                                                                              
        done                                                                                                              
        /usr/bin/systemctl reload rsyslog                                                                                 
    endscript

It uses maxage 365, but it doesn't seem to be cleaning up old stuff, as there were logs dating back to 2019. We could improve that script to make sure anything older than a year ago gets cleaned up. Any thoughts?

Actions #12

Updated by crameleon 9 months ago

Hey @luc14n0, thanks for taking care of this. I don't quite understand the reason for this postrotate script, logrotate is perfectly able to compress logs itself. The sorting into subdirectories by year only makes sense if there is a reason to keep year old logs in the first place.

Actions #13

Updated by crameleon 9 months ago

I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.

Actions #14

Updated by luc14n0 9 months ago

crameleon wrote in #note-12:

Hey @luc14n0, thanks for taking care of this. I don't quite understand the reason for this postrotate script, logrotate is perfectly able to compress logs itself. The sorting into subdirectories by year only makes sense if there is a reason to keep year old logs in the first place.

Yeah. No problem.

The comment in the conf file says:

# compress doesn't work here as it tries to rotate the                                                                
# logfile in OLD/ which is moved away already in                                                                      
# postrotate
Actions #15

Updated by luc14n0 9 months ago

crameleon wrote in #note-13:

I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.

No objections from my side.

Actions #16

Updated by pjessen_invalid 9 months ago

crameleon wrote in #note-13:

I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.

Wrt archiving, backup.i.o.o is already used for archiving some logfiles, e.g. for /var/log/messages and /var/log/mail from pontifex. See #127766.

Actions #17

Updated by luc14n0 9 months ago

pjessen_invalid wrote in #note-16:

crameleon wrote in #note-13:

I also think monitor.i.o.o should not be a syslog server. We might want to install a dedicated syslog server with sufficient archive storage after our move to Prague.

Wrt archiving, backup.i.o.o is already used for archiving some logfiles, e.g. for /var/log/messages and /var/log/mail from pontifex. See #127766.

That looks interesting. I'm gonna take a look at that.

Actions #18

Updated by crameleon 20 days ago

  • Status changed from Feedback to Resolved

Hi,

the failures should be resolved now. Additionally the machine gained more disk space for new duties.
Moving the syslog server to a different machine is still a TODO, but out of scope here.

Actions

Also available in: Atom PDF