Project

General

Profile

Actions

action #57476

closed

Recurring partitions full and logrotate fails, possibly due to disabling /var/log/openqa as log target

Added by okurz about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2019-09-28
Due date:
% Done:

0%

Estimated time:

Description

Observation

There had been some alerts about space depletion on either / or /srv in the past days. This seems to have started on the afternoon of 2019-09-25 when /srv started to fill up. From 2019-09-26 on there are hourly spikes in the space usage on / increasing in size over time:
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&fullscreen&panelId=74&from=1569378790693&to=1569669597083

I assume that since coolo switched /etc/openqa/openqa.ini to not log messages to /var/log/openqa anymore we end up with openQA debug messages including SQL debugging and these messages seem to not only end up in the system journal but also in /var/log/messages. The logrotate service has failed because of out of space conditions and also /var/spool/mail/root shows the problem why / is running out of space on an hourly base:

Subject: Cron <root@openqa> /usr/sbin/logwatch --service dmeventd
…
cat: write error: No space left on device
system 'cat '/var/log/messages-20190928'  >> /var/cache/logwatch/logwatch.hghdmAm9/messages-archive' failed: 256 at /usr/sbin/logwatch line 772.

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #93195: [Alerting] Failed systemd services alert (except openqa.suse.de) on 2021-05-28, logrotate.service on openqaworker-arm-1Resolvedokurz2021-05-282021-06-11

Actions
Actions #1

Updated by okurz about 5 years ago

  • Status changed from New to Feedback
  • Assignee set to okurz

I checked if logwatch is using any special configuration and it seems not so I uninstalled the package. If we still need it we can simply reinstall it but should cover it properly with salt.

Backed up complete /srv/log/ to backup.qa.suse.de:/home/backup/osd/srv/log/ and removed syslog with zypper rm -u syslog-service rsyslog. Made journal persistent by creating /var/log/journal/ . Manually triggered compression of log files that are not written to anymore and deleted the source files. Now we are back to sane space usage levels:

/dev/vda1       9.6G  5.8G  3.4G  64% /
/dev/vdc         80G   47G   34G  59% /srv

If this turns out to be what we want we should do the same for o3.

We decided it's fine but we want back to /var/log/openqa to prevent spammy logs in minion jobs, e.g. https://openqa.suse.de/minion/jobs?id=264533 .

I have done the according changes now on both o3 and osd.

Actions #2

Updated by okurz about 5 years ago

  • Status changed from Feedback to Resolved
  • Target version set to Done
Actions #3

Updated by okurz over 3 years ago

  • Related to action #93195: [Alerting] Failed systemd services alert (except openqa.suse.de) on 2021-05-28, logrotate.service on openqaworker-arm-1 added
Actions

Also available in: Atom PDF