action #57476

Recurring partitions full and logrotate fails, possibly due to disabling /var/log/openqa as log target

Added by okurz 5 months ago. Updated 5 months ago.

Status:ResolvedStart date:28/09/2019
Priority:UrgentDue date:
Assignee:okurz% Done:

0%

Category:-
Target version:openQA Project - Done
Duration:

Description

Observation

There had been some alerts about space depletion on either / or /srv in the past days. This seems to have started on the afternoon of 2019-09-25 when /srv started to fill up. From 2019-09-26 on there are hourly spikes in the space usage on / increasing in size over time:
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&fullscreen&panelId=74&from=1569378790693&to=1569669597083

I assume that since coolo switched /etc/openqa/openqa.ini to not log messages to /var/log/openqa anymore we end up with openQA debug messages including SQL debugging and these messages seem to not only end up in the system journal but also in /var/log/messages. The logrotate service has failed because of out of space conditions and also /var/spool/mail/root shows the problem why / is running out of space on an hourly base:

Subject: Cron <root@openqa> /usr/sbin/logwatch --service dmeventd
…
cat: write error: No space left on device
system 'cat '/var/log/messages-20190928'  >> /var/cache/logwatch/logwatch.hghdmAm9/messages-archive' failed: 256 at /usr/sbin/logwatch line 772.

Suggestions

History

#1 Updated by okurz 5 months ago

  • Status changed from New to Feedback
  • Assignee set to okurz

I checked if logwatch is using any special configuration and it seems not so I uninstalled the package. If we still need it we can simply reinstall it but should cover it properly with salt.

Backed up complete /srv/log/ to backup.qa.suse.de:/home/backup/osd/srv/log/ and removed syslog with zypper rm -u syslog-service rsyslog. Made journal persistent by creating /var/log/journal/ . Manually triggered compression of log files that are not written to anymore and deleted the source files. Now we are back to sane space usage levels:

/dev/vda1       9.6G  5.8G  3.4G  64% /
/dev/vdc         80G   47G   34G  59% /srv

If this turns out to be what we want we should do the same for o3.

We decided it's fine but we want back to /var/log/openqa to prevent spammy logs in minion jobs, e.g. https://openqa.suse.de/minion/jobs?id=264533 .

I have done the according changes now on both o3 and osd.

#2 Updated by okurz 5 months ago

  • Status changed from Feedback to Resolved
  • Target version set to Done

Also available in: Atom PDF