Actions
action #168301
opencoordination #162716: [epic] Better use of storage on OSD workers
After w40 related problems reconsider storage use for all PRG2 based OSD workers
Start date:
2024-06-21
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Motivation¶
See #162719 and #162725. It seems what was overlooked as part of #162719 is that w40 probably temporarily lost the connection to one of it's NVMes. Right now lsblk
shows
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme1n1 259:0 0 476.9G 0 disk
└─md127 9:127 0 6.3T 0 raid0 /var/lib/openqa
nvme2n1 259:1 0 476.9G 0 disk
├─nvme2n1p1 259:2 0 512M 0 part /boot/efi
├─nvme2n1p2 259:3 0 293G 0 part /
…
nvme0n1 259:6 0 5.8T 0 disk
└─md127 9:127 0 6.3T 0 raid0 /var/lib/openqa
so a RAID0 is constructed between a 500GiB and a 6TiB device which does not make much sense to me. So after all I think the approach from dheidler in #162719 was not enough as it was not addressing the real problem.
Acceptance criteria¶
- AC1: All PRG2 x86_64 OSD workers use their existing local storage more efficiently
Acceptance tests¶
- AT1-1:
ssh osd sudo salt --no-color -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'df -h /var/lib/openqa'
shows > 1TB for all PRG2 workers
Suggestions¶
- Review what was done in #162719-15 manually, consider to change mount points using the salt states in https://gitlab.suse.de/openqa/salt-states-openqa/-/tree/master/openqa/nvme_store?ref_type=heads that prepare devices accordingly. If not feasible then apply the same either manually to all machines
Out of scope¶
Ordering any new physical storage devices
Actions