Actions
action #162719
closedcoordination #162716: [epic] Better use of storage on OSD workers
Ensure w40 has more space for worker pool directories size:S
Status:
Resolved
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-06-21
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Motivation¶
w40 ran out of space in /var/lib/openqa despite having another partition with multiple TB free space. We should reconsider the choices we made for setting up OSD PRG2 workers.
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:1 0 5.8T 0 disk
├─nvme0n1p1 259:2 0 512M 0 part /boot/efi
├─nvme0n1p2 259:3 0 5.8T 0 part /var
…
│ /
└─nvme0n1p3 259:4 0 1G 0 part [SWAP]
nvme2n1 259:5 0 476.9G 0 disk
└─md127 9:127 0 476.8G 0 raid0 /var/lib/openqa
# hdparm -tT /dev/nvme?n1
/dev/nvme0n1:
Timing cached reads: 30178 MB in 1.99 seconds = 15202.23 MB/sec
Timing buffered disk reads: 6360 MB in 3.00 seconds = 2120.00 MB/sec
/dev/nvme2n1:
Timing cached reads: 33204 MB in 1.98 seconds = 16739.11 MB/sec
Timing buffered disk reads: 8478 MB in 3.00 seconds = 2825.74 MB/sec
nvme2n1 seems to be 30% faster but is more limited in space.
Acceptance criteria¶
- AC1: w40 has significantly more space than 500G for pool+cache combined
Suggestions¶
- Maybe nevertheless use pool on nvme0? Or combine using nvme2n1 as smart backing cache?
- Change mount points using the salt states in https://gitlab.suse.de/openqa/salt-states-openqa/-/tree/master/openqa/nvme_store?ref_type=heads that prepare devices
- Let systemd services recreate the filesystem and folder structure accordingly
Out of scope¶
Rollback steps¶
Updated by okurz 5 months ago
- Copied to action #162725: After w40 reconsider storage use for other OSD workers size:S added
Updated by okurz 5 months ago
- Related to action #162602: [FIRING:1] worker40 (worker40: CPU load alert openQA worker40 salt cpu_load_alert_worker40 worker) size:S added
Updated by dheidler 4 months ago · Edited
(using sda for old disk and sdb for new disk for root fs here as it is shorter)
This describes how to move the root fs to the smaller disk (here sdb).
The script from salt will automatically use the other disk for /var/lib/openqa.
- be aware that there is a difference between partition UUID and filesystem UUID - use
blkid
to view both. - unmounted /var/lib/openqa
- online-resized the existing btrfs filesystem and partition
- copied over the data using dd
- copied over the GPT table using sgdisk (e.g.
/dev/sda -R /dev/sdb
) - generated new part (
sgdisk -G /dev/sdb
) - generated new UUID for new btrfs filesystem:
btrfstune -u /dev/sdb2
- generated new UUID for new vfat EFI partition:
mlabel -s -n :: -i /dev/sdb1
- deal with the swap partition (make sure it is in the right place on the new disk)
mount /dev/sdb2 /mnt
- replace old btrfs UUID with new one in /mnt/etc/fstab and /mnt/boot/grub/grub.cfg
umount /boot/efi
- mount new efi partition
mount /dev/sdb1 /boot/efi
- replace old btrfs UUID with new one in /boot/efi/EFI/opensuse/grub.cfg
- update bootloader in EFI vars using
update-bootloader --install
- make sure, you got the right boot partition in EFI using
bootctl
andefibootmgr -v
- if needed, remove the old boot entry using
efibootmgr --delete -b XXXX
- reboot
Updated by dheidler 4 months ago
- Status changed from In Progress to Feedback
Increase number of worker slots on w40 again.
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/866
Actions