action #57680

o3: change /var/log to bind mount to prevent out-of-space (was: o3 root volume very limited, nearly out of space soon again)

Added by okurz 5 months ago. Updated 4 months ago.

Status:WorkableStart date:03/10/2019
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:openQA Project - Ready
Duration:

Description

Observation

/dev/vda1       9.6G  7.6G  1.6G  83% /
# fdisk -l
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x025b0d73

Device     Boot    Start      End  Sectors   Size Id Type
/dev/vda1  *        2048 20450744 20448697   9.8G 83 Linux
/dev/vda2       20451328 20964824   513497 250.7M 82 Linux swap / Solaris

We do not have much headroom and more and more often we run into an out-of-space condition due to mistakes, not frequent enough logrotate or overflowing mail spool dir.

Suggestion

We could safe more space by moving some directories to other partitions, e.g. /var/log to /space similar to osd where we have /var/log pointing to /srv/log on another partition. Or we simply ask the partition to be increased by engineering infrastructure. For now I tend to do … both, actually.

History

#1 Updated by okurz 5 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version set to Current Sprint
systemctl stop systemd-journald apache2 openqa-webui openqa-websockets openqa-livehandler openqa-gru openqa-scheduler && rsync -aHP /var/log/ /space/log/ && mv /var/log/ /var/log.old/ && ln -s /space/log log && systemctl start systemd-journald apache2 openqa-webui openqa-websockets openqa-livehandler openqa-gru openqa-scheduler

But then openQA failed to start up because it was denied access to /var/log/openqa by apparmor. Apparently though there is no entry in /var/log/audit/audit.log because auditd did not log any event after 2019-09-20. I restarted auditd and it was fine again.

So for now I rolled back the change and enforced again the apparmor profile:

systemctl stop systemd-journald apache2 openqa-webui openqa-websockets openqa-livehandler openqa-gru openqa-scheduler && rsync -aHP /space/log/ /var/log.old/ && rm /var/log && mv /var/log.old/ /var/log/ && aa-enforce /etc/apparmor.d/usr.share.openqa.script.openqa && systemctl start systemd-journald apache2 openqa-webui openqa-websockets openqa-livehandler openqa-gru openqa-scheduler

we can now again more carefully look into apparmor denied actions before going to /space/log again.

With aa-logprof -f /var/log/audit/audit.log.1 there is the suggestion to cover

+  owner /space/log/openqa wk,

in the apparmor profile. I guess it would be better if we use a bind-mount instead of symlink for /space/log aka /var/log. WDYT?

#2 Updated by okurz 5 months ago

  • Status changed from In Progress to Feedback

#3 Updated by cdywan 5 months ago

okurz wrote:

But then openQA failed to start up because it was denied access to /var/log/openqa by apparmor. Apparently though there is no entry in /var/log/audit/audit.log because auditd did not log any event after 2019-09-20. I restarted auditd and it was fine again.

/var/log/openqa was already in there, wasn't it?

Also, why did auditd break? Might be worth investigating that as well.

With aa-logprof -f /var/log/audit/audit.log.1 there is the suggestion to cover


+ owner /space/log/openqa wk,

in the apparmor profile. I guess it would be better if we use a bind-mount instead of symlink for /space/log aka /var/log. WDYT?

If /space/log is meant to fully replace /var/log I'd say a bind mount seems like the most natural choice to me. It would avoid apparmor changes or people looking in the wrong place.

What things do we actually want in /var/log btw that shouldn't be covered by systemd journal?

#4 Updated by okurz 4 months ago

  • Subject changed from o3 root volume very limited, nearly out of space soon again to o3: change /var/log to bind mount to prevent out-of-space (was: o3 root volume very limited, nearly out of space soon again)
  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Priority changed from High to Normal
  • Target version changed from Current Sprint to Ready

Also available in: Atom PDF