action #162719
closed
coordination #162716: [epic] Better use of storage on OSD workers
Ensure w40 has more space for worker pool directories size:S
Added by okurz 7 months ago.
Updated 7 months ago.
Category:
Feature requests
Description
Motivation¶
w40 ran out of space in /var/lib/openqa despite having another partition with multiple TB free space. We should reconsider the choices we made for setting up OSD PRG2 workers.
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:1 0 5.8T 0 disk
├─nvme0n1p1 259:2 0 512M 0 part /boot/efi
├─nvme0n1p2 259:3 0 5.8T 0 part /var
…
│ /
└─nvme0n1p3 259:4 0 1G 0 part [SWAP]
nvme2n1 259:5 0 476.9G 0 disk
└─md127 9:127 0 476.8G 0 raid0 /var/lib/openqa
# hdparm -tT /dev/nvme?n1
/dev/nvme0n1:
Timing cached reads: 30178 MB in 1.99 seconds = 15202.23 MB/sec
Timing buffered disk reads: 6360 MB in 3.00 seconds = 2120.00 MB/sec
/dev/nvme2n1:
Timing cached reads: 33204 MB in 1.98 seconds = 16739.11 MB/sec
Timing buffered disk reads: 8478 MB in 3.00 seconds = 2825.74 MB/sec
nvme2n1 seems to be 30% faster but is more limited in space.
Acceptance criteria¶
- AC1: w40 has significantly more space than 500G for pool+cache combined
Suggestions¶
Out of scope¶
Rollback steps¶
- Tracker changed from coordination to action
- Project changed from QA (public) to openQA Infrastructure (public)
- Copied to action #162725: After w40 reconsider storage use for other OSD workers size:S added
- Description updated (diff)
- Target version changed from future to Ready
- Subject changed from Ensure w40 has more space for worker pool directories to Ensure w40 has more space for worker pool directories size:S
- Description updated (diff)
- Category set to Feature requests
- Status changed from New to Workable
- Status changed from Workable to Blocked
- Assignee set to okurz
Actually because w40 is critical we should block on #158146
- Status changed from Blocked to Workable
- Related to action #162602: [FIRING:1] worker40 (worker40: CPU load alert openQA worker40 salt cpu_load_alert_worker40 worker) size:S added
- Assignee set to nicksinger
- Assignee deleted (
nicksinger)
I'm not currently working on it
- Priority changed from Normal to High
This is blocking #162596 which is High, consequently this has to be.
- Status changed from Workable to In Progress
(using sda for old disk and sdb for new disk for root fs here as it is shorter)
This describes how to move the root fs to the smaller disk (here sdb).
The script from salt will automatically use the other disk for /var/lib/openqa.
- be aware that there is a difference between partition UUID and filesystem UUID - use
blkid
to view both.
- unmounted /var/lib/openqa
- online-resized the existing btrfs filesystem and partition
- copied over the data using dd
- copied over the GPT table using sgdisk (e.g.
/dev/sda -R /dev/sdb
)
- generated new part (
sgdisk -G /dev/sdb
)
- generated new UUID for new btrfs filesystem:
btrfstune -u /dev/sdb2
- generated new UUID for new vfat EFI partition:
mlabel -s -n :: -i /dev/sdb1
- deal with the swap partition (make sure it is in the right place on the new disk)
mount /dev/sdb2 /mnt
- replace old btrfs UUID with new one in /mnt/etc/fstab and /mnt/boot/grub/grub.cfg
umount /boot/efi
- mount new efi partition
mount /dev/sdb1 /boot/efi
- replace old btrfs UUID with new one in /boot/efi/EFI/opensuse/grub.cfg
- update bootloader in EFI vars using
update-bootloader --install
- make sure, you got the right boot partition in EFI using
bootctl
and efibootmgr -v
- if needed, remove the old boot entry using
efibootmgr --delete -b XXXX
- reboot
- Status changed from In Progress to Feedback
- Status changed from Feedback to Resolved
Also available in: Atom
PDF