action #177159
open[alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` size: S
0%
Description
Observation¶
This was problematic in the past, see #173947. I had a brief look on s390zl12.oqa.prg2.suse.org
but couldn't find much I could easily remove.
The caused an alert when the disk usage was for two hours at 88 %, see https://monitor.qa.suse.de/d/GDs390zl12/dashboard-for-s390zl12?orgId=1&viewPanel=panel-65090&from=2025-02-13T08:30:31.456Z&to=2025-02-13T09:30:26.604Z&timezone=browser&var-datasource=000000001&refresh=1m. Since disk usage is now back at 82 % it isn't clear what caused this.
Acceptance Criteria¶
- AC1: The disk usage is considerably below the 80% alert threshold
Suggestions¶
- Use a bigger disk which is possible because we have a virtual device but 40GB should actually be enough for a special purpose OS instance
- Limit space used by snapshots … if snapshots actually are the culprit
- As this is about the root filesystem and we have a separate one for /var/lib/libvirt/images it should be certainly feasible to reach well below 40GB with the root f/s. Just use
btrfs fi du / $something
or variants to find out where we loose the space and cleanup - Re-run commands from #173947#note-8 again and try to make sense of the output.
Files
Updated by robert.richardson 4 days ago
- Subject changed from [alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` to [alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` size: S
- Description updated (diff)
- Status changed from New to Workable
Updated by nicksinger 3 days ago
- Status changed from Workable to In Progress
- Assignee set to nicksinger
Updated by nicksinger 3 days ago
So initial screening of the machine shows:
s390zl12:~ # mount -o subvolid=5 /dev/dasda2 /mnt/btrfs/
s390zl12:~ # du -sh /mnt/btrfs/*
89G /mnt/btrfs/@
s390zl12:~ # du -sh /mnt/btrfs/@/*
15M /mnt/btrfs/@/boot
4.0K /mnt/btrfs/@/etc
2.4M /mnt/btrfs/@/home
0 /mnt/btrfs/@/opt
184K /mnt/btrfs/@/root
0 /mnt/btrfs/@/srv
4.1M /mnt/btrfs/@/tmp
4.0K /mnt/btrfs/@/usr
21G /mnt/btrfs/@/var
s390zl12:/mnt/btrfs/@/.snapshots # btrfs filesystem du -s *
Total Exclusive Set shared Filename
5.44GiB 32.00KiB 5.44GiB 400
5.54GiB 229.93MiB 5.32GiB 646
5.45GiB 132.00KiB 5.45GiB 647
5.45GiB 4.00KiB 5.45GiB 648
5.53GiB 144.00KiB 5.53GiB 649
5.53GiB 92.00KiB 5.53GiB 650
5.53GiB 8.16MiB 5.52GiB 651
5.53GiB 152.00KiB 5.53GiB 652
5.53GiB 0.00B 5.53GiB 653
5.44GiB 132.00KiB 5.44GiB 654
5.44GiB 28.00KiB 5.44GiB 655
5.44GiB 44.00KiB 5.44GiB 656
0.00B 0.00B 0.00B grub-snapshot.cfg
so the biggest snapshot only needs 229.93MiB. However, the var-subvolume looks rather big. Checking on the live-system I can see:
s390zl12:/mnt/btrfs/@ # du -shx /var
4.9G /var
and following these crumbs I find that /mnt/btrfs/@/var/lib/libvirt/images/
uses 17G. So we have libvirt images on the root disk which are supposed to reside on a separate disk/partition. I will clean them up and check if I can improve boot-dependencies.
Updated by nicksinger 3 days ago · Edited
Just for completeness, a list of these old files:
s390zl12:/mnt/btrfs/@/var/lib/libvirt/images # ls -lah
total 17G
drwx--x--x 1 root root 752 Sep 3 12:20 .
drwxr-xr-x 1 root root 88 Jun 27 2024 ..
-rw-r--r-- 1 qemu qemu 5.1G Sep 3 12:24 openQA-SUT-12a.img
-rw-r--r-- 1 qemu qemu 56M Sep 3 12:03 openQA-SUT-12.initrd
-rw-r--r-- 1 qemu qemu 8.0M Sep 3 12:03 openQA-SUT-12.kernel
-rw-r--r-- 1 root root 1.5K Sep 3 12:03 openQA-SUT-12.xml
-rw-r--r-- 1 qemu qemu 2.7G Sep 3 12:11 openQA-SUT-14a.img
-rw-r--r-- 1 qemu qemu 48M Sep 3 12:03 openQA-SUT-14.initrd
-rw-r--r-- 1 qemu qemu 7.9M Sep 3 12:03 openQA-SUT-14.kernel
-rw-r--r-- 1 root root 1.6K Sep 3 12:03 openQA-SUT-14.xml
-rw-r--r-- 1 root root 2.8G Sep 3 12:19 openQA-SUT-17a.img
-rw-r--r-- 1 root root 56M Sep 3 12:03 openQA-SUT-17.initrd
-rw-r--r-- 1 root root 8.0M Sep 3 12:03 openQA-SUT-17.kernel
-rw-r--r-- 1 root root 1.6K Sep 3 12:03 openQA-SUT-17.xml
-rw-r--r-- 1 root root 2.7G Sep 3 12:19 openQA-SUT-18a.img
-rw-r--r-- 1 root root 48M Sep 3 12:03 openQA-SUT-18.initrd
-rw-r--r-- 1 root root 7.9M Sep 3 12:03 openQA-SUT-18.kernel
-rw-r--r-- 1 root root 1.6K Sep 3 12:03 openQA-SUT-18.xml
-rw-r--r-- 1 root root 264M Sep 3 12:20 supp_sles15sp4_updatestack-s390x.qcow2
-rw-r--r-- 1 root root 2.6G Sep 3 12:20 supp_sles15sp5_updatestack-s390x.qcow2
All of this is auto-generated and rather old -> trash
Updated by nicksinger 3 days ago
- File clipboard-202502191121-ropn1.png clipboard-202502191121-ropn1.png added
- Status changed from In Progress to Feedback
Updated by nicksinger 3 days ago
- Status changed from Feedback to In Progress
- Priority changed from High to Normal
My changes only order the unit and do not require the mount point to be present. A related discussion can be found in Slack. I'm looking into possible solutions
Updated by openqa_review 3 days ago
- Due date set to 2025-03-06
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nicksinger 1 day ago
- Status changed from In Progress to Feedback
My MR now includes management of these storage partitions in /etc/fstab and more complex interaction between the mount-unit and the libvirtd.service. To not break both workers at the same time automatically I removed the entry from top.sls in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1380 and just introduce the state first. After this is merged, I can test with state.apply libvirt.storage
on a single host and only introduce it once everything is working as expected.