action #57476
closedRecurring partitions full and logrotate fails, possibly due to disabling /var/log/openqa as log target
0%
Description
Observation¶
There had been some alerts about space depletion on either / or /srv in the past days. This seems to have started on the afternoon of 2019-09-25 when /srv started to fill up. From 2019-09-26 on there are hourly spikes in the space usage on / increasing in size over time:
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&fullscreen&panelId=74&from=1569378790693&to=1569669597083
I assume that since coolo switched /etc/openqa/openqa.ini to not log messages to /var/log/openqa anymore we end up with openQA debug messages including SQL debugging and these messages seem to not only end up in the system journal but also in /var/log/messages. The logrotate service has failed because of out of space conditions and also /var/spool/mail/root shows the problem why / is running out of space on an hourly base:
Subject: Cron <root@openqa> /usr/sbin/logwatch --service dmeventd
…
cat: write error: No space left on device
system 'cat '/var/log/messages-20190928' >> /var/cache/logwatch/logwatch.hghdmAm9/messages-archive' failed: 256 at /usr/sbin/logwatch line 772.
Suggestions¶
- As we do not have logwatch covered in http://gitlab.suse.de/openqa/salt-states-openqa I assume we do not need it anymore
- We should consider removing
rsyslog
andsyslog-service
and instead configure a persistent journal by creating the directory /var/log/journal/ as we already do in https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/worker.sls#L324 - Crosscheck after above two points how the space usage behaves, e.g. if openQA sql debug information is still written to /var/log/messages
Updated by okurz about 5 years ago
- Status changed from New to Feedback
- Assignee set to okurz
I checked if logwatch is using any special configuration and it seems not so I uninstalled the package. If we still need it we can simply reinstall it but should cover it properly with salt.
Backed up complete /srv/log/ to backup.qa.suse.de:/home/backup/osd/srv/log/ and removed syslog with zypper rm -u syslog-service rsyslog
. Made journal persistent by creating /var/log/journal/ . Manually triggered compression of log files that are not written to anymore and deleted the source files. Now we are back to sane space usage levels:
/dev/vda1 9.6G 5.8G 3.4G 64% /
/dev/vdc 80G 47G 34G 59% /srv
If this turns out to be what we want we should do the same for o3.
We decided it's fine but we want back to /var/log/openqa to prevent spammy logs in minion jobs, e.g. https://openqa.suse.de/minion/jobs?id=264533 .
I have done the according changes now on both o3 and osd.
Updated by okurz about 5 years ago
- Status changed from Feedback to Resolved
- Target version set to Done
Updated by okurz over 3 years ago
- Related to action #93195: [Alerting] Failed systemd services alert (except openqa.suse.de) on 2021-05-28, logrotate.service on openqaworker-arm-1 added