action #177159
closed[alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` size:S
0%
Description
Observation¶
This was problematic in the past, see #173947. I had a brief look on s390zl12.oqa.prg2.suse.org
but couldn't find much I could easily remove.
The caused an alert when the disk usage was for two hours at 88 %, see https://monitor.qa.suse.de/d/GDs390zl12/dashboard-for-s390zl12?orgId=1&viewPanel=panel-65090&from=2025-02-13T08:30:31.456Z&to=2025-02-13T09:30:26.604Z&timezone=browser&var-datasource=000000001&refresh=1m. Since disk usage is now back at 82 % it isn't clear what caused this.
Acceptance Criteria¶
- AC1: The disk usage is considerably below the 80% alert threshold
Suggestions¶
- Use a bigger disk which is possible because we have a virtual device but 40GB should actually be enough for a special purpose OS instance
- Limit space used by snapshots … if snapshots actually are the culprit
- As this is about the root filesystem and we have a separate one for /var/lib/libvirt/images it should be certainly feasible to reach well below 40GB with the root f/s. Just use
btrfs fi du / $something
or variants to find out where we loose the space and cleanup - Re-run commands from #173947#note-8 again and try to make sense of the output.
Files
Updated by robert.richardson 3 months ago
- Subject changed from [alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` to [alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` size: S
- Description updated (diff)
- Status changed from New to Workable
Updated by nicksinger 3 months ago
- Status changed from Workable to In Progress
- Assignee set to nicksinger
Updated by nicksinger 3 months ago
So initial screening of the machine shows:
s390zl12:~ # mount -o subvolid=5 /dev/dasda2 /mnt/btrfs/
s390zl12:~ # du -sh /mnt/btrfs/*
89G /mnt/btrfs/@
s390zl12:~ # du -sh /mnt/btrfs/@/*
15M /mnt/btrfs/@/boot
4.0K /mnt/btrfs/@/etc
2.4M /mnt/btrfs/@/home
0 /mnt/btrfs/@/opt
184K /mnt/btrfs/@/root
0 /mnt/btrfs/@/srv
4.1M /mnt/btrfs/@/tmp
4.0K /mnt/btrfs/@/usr
21G /mnt/btrfs/@/var
s390zl12:/mnt/btrfs/@/.snapshots # btrfs filesystem du -s *
Total Exclusive Set shared Filename
5.44GiB 32.00KiB 5.44GiB 400
5.54GiB 229.93MiB 5.32GiB 646
5.45GiB 132.00KiB 5.45GiB 647
5.45GiB 4.00KiB 5.45GiB 648
5.53GiB 144.00KiB 5.53GiB 649
5.53GiB 92.00KiB 5.53GiB 650
5.53GiB 8.16MiB 5.52GiB 651
5.53GiB 152.00KiB 5.53GiB 652
5.53GiB 0.00B 5.53GiB 653
5.44GiB 132.00KiB 5.44GiB 654
5.44GiB 28.00KiB 5.44GiB 655
5.44GiB 44.00KiB 5.44GiB 656
0.00B 0.00B 0.00B grub-snapshot.cfg
so the biggest snapshot only needs 229.93MiB. However, the var-subvolume looks rather big. Checking on the live-system I can see:
s390zl12:/mnt/btrfs/@ # du -shx /var
4.9G /var
and following these crumbs I find that /mnt/btrfs/@/var/lib/libvirt/images/
uses 17G. So we have libvirt images on the root disk which are supposed to reside on a separate disk/partition. I will clean them up and check if I can improve boot-dependencies.
Updated by nicksinger 3 months ago · Edited
Just for completeness, a list of these old files:
s390zl12:/mnt/btrfs/@/var/lib/libvirt/images # ls -lah
total 17G
drwx--x--x 1 root root 752 Sep 3 12:20 .
drwxr-xr-x 1 root root 88 Jun 27 2024 ..
-rw-r--r-- 1 qemu qemu 5.1G Sep 3 12:24 openQA-SUT-12a.img
-rw-r--r-- 1 qemu qemu 56M Sep 3 12:03 openQA-SUT-12.initrd
-rw-r--r-- 1 qemu qemu 8.0M Sep 3 12:03 openQA-SUT-12.kernel
-rw-r--r-- 1 root root 1.5K Sep 3 12:03 openQA-SUT-12.xml
-rw-r--r-- 1 qemu qemu 2.7G Sep 3 12:11 openQA-SUT-14a.img
-rw-r--r-- 1 qemu qemu 48M Sep 3 12:03 openQA-SUT-14.initrd
-rw-r--r-- 1 qemu qemu 7.9M Sep 3 12:03 openQA-SUT-14.kernel
-rw-r--r-- 1 root root 1.6K Sep 3 12:03 openQA-SUT-14.xml
-rw-r--r-- 1 root root 2.8G Sep 3 12:19 openQA-SUT-17a.img
-rw-r--r-- 1 root root 56M Sep 3 12:03 openQA-SUT-17.initrd
-rw-r--r-- 1 root root 8.0M Sep 3 12:03 openQA-SUT-17.kernel
-rw-r--r-- 1 root root 1.6K Sep 3 12:03 openQA-SUT-17.xml
-rw-r--r-- 1 root root 2.7G Sep 3 12:19 openQA-SUT-18a.img
-rw-r--r-- 1 root root 48M Sep 3 12:03 openQA-SUT-18.initrd
-rw-r--r-- 1 root root 7.9M Sep 3 12:03 openQA-SUT-18.kernel
-rw-r--r-- 1 root root 1.6K Sep 3 12:03 openQA-SUT-18.xml
-rw-r--r-- 1 root root 264M Sep 3 12:20 supp_sles15sp4_updatestack-s390x.qcow2
-rw-r--r-- 1 root root 2.6G Sep 3 12:20 supp_sles15sp5_updatestack-s390x.qcow2
All of this is auto-generated and rather old -> trash
Updated by nicksinger 3 months ago
- File clipboard-202502191121-ropn1.png clipboard-202502191121-ropn1.png added
- Status changed from In Progress to Feedback
Updated by nicksinger 3 months ago
- Status changed from Feedback to In Progress
- Priority changed from High to Normal
My changes only order the unit and do not require the mount point to be present. A related discussion can be found in Slack. I'm looking into possible solutions
Updated by openqa_review 3 months ago
- Due date set to 2025-03-06
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nicksinger 3 months ago
- Status changed from In Progress to Feedback
My MR now includes management of these storage partitions in /etc/fstab and more complex interaction between the mount-unit and the libvirtd.service. To not break both workers at the same time automatically I removed the entry from top.sls in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1380 and just introduce the state first. After this is merged, I can test with state.apply libvirt.storage
on a single host and only introduce it once everything is working as expected.
Updated by nicksinger 3 months ago
initial MR tested and deployed on zl12+13, fixups and to enable it permanently: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1387
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1386 is a side-product of this.
Updated by nicksinger 3 months ago
- Status changed from Feedback to Resolved
Updated by okurz 3 months ago
- Subject changed from [alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` size: S to [alert] Disk `/dev/dasda2` (the btrfs root filesystem) is quite full (over 80 %) on `s390zl12.oqa.prg2.suse.org` size:S
- Due date deleted (
2025-03-06) - Status changed from Resolved to Workable
We found that s390zl12+13 have unaccepted salt keys, potentially related to this ticket although taking those machines out of production was never mentioned anywhere. Maybe somebody else on alert duty did.
@nicksinger please accept salt keys, apply a high state, monitor and resolve at your convenience.
Updated by nicksinger 3 months ago
- Status changed from Workable to In Progress
Machines added again and a highstate applied cleanly. Checking some instances and jobs on OSD now