Recurring partitions full and logrotate fails, possibly due to disabling /var/log/openqa as log target
There had been some alerts about space depletion on either / or /srv in the past days. This seems to have started on the afternoon of 2019-09-25 when /srv started to fill up. From 2019-09-26 on there are hourly spikes in the space usage on / increasing in size over time:
I assume that since coolo switched /etc/openqa/openqa.ini to not log messages to /var/log/openqa anymore we end up with openQA debug messages including SQL debugging and these messages seem to not only end up in the system journal but also in /var/log/messages. The logrotate service has failed because of out of space conditions and also /var/spool/mail/root shows the problem why / is running out of space on an hourly base:
Subject: Cron <root@openqa> /usr/sbin/logwatch --service dmeventd … cat: write error: No space left on device system 'cat '/var/log/messages-20190928' >> /var/cache/logwatch/logwatch.hghdmAm9/messages-archive' failed: 256 at /usr/sbin/logwatch line 772.
- As we do not have logwatch covered in http://gitlab.suse.de/openqa/salt-states-openqa I assume we do not need it anymore
- We should consider removing
syslog-serviceand instead configure a persistent journal by creating the directory /var/log/journal/ as we already do in https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/worker.sls#L324
- Crosscheck after above two points how the space usage behaves, e.g. if openQA sql debug information is still written to /var/log/messages
#1 Updated by okurz almost 2 years ago
- Status changed from New to Feedback
- Assignee set to okurz
I checked if logwatch is using any special configuration and it seems not so I uninstalled the package. If we still need it we can simply reinstall it but should cover it properly with salt.
Backed up complete /srv/log/ to backup.qa.suse.de:/home/backup/osd/srv/log/ and removed syslog with
zypper rm -u syslog-service rsyslog. Made journal persistent by creating /var/log/journal/ . Manually triggered compression of log files that are not written to anymore and deleted the source files. Now we are back to sane space usage levels:
/dev/vda1 9.6G 5.8G 3.4G 64% / /dev/vdc 80G 47G 34G 59% /srv
If this turns out to be what we want we should do the same for o3.
We decided it's fine but we want back to /var/log/openqa to prevent spammy logs in minion jobs, e.g. https://openqa.suse.de/minion/jobs?id=264533 .
I have done the according changes now on both o3 and osd.