action #168301
opencoordination #162716: [epic] Better use of storage on OSD workers
After w40 related problems reconsider storage use for all PRG2 based OSD workers
0%
Description
Motivation¶
See #162719 and #162725. It seems what was overlooked as part of #162719 is that w40 probably temporarily lost the connection to one of it's NVMes. Right now lsblk
shows
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme1n1 259:0 0 476.9G 0 disk
└─md127 9:127 0 6.3T 0 raid0 /var/lib/openqa
nvme2n1 259:1 0 476.9G 0 disk
├─nvme2n1p1 259:2 0 512M 0 part /boot/efi
├─nvme2n1p2 259:3 0 293G 0 part /
…
nvme0n1 259:6 0 5.8T 0 disk
└─md127 9:127 0 6.3T 0 raid0 /var/lib/openqa
so a RAID0 is constructed between a 500GiB and a 6TiB device which does not make much sense to me. So after all I think the approach from dheidler in #162719 was not enough as it was not addressing the real problem.
Acceptance criteria¶
- AC1: All PRG2 x86_64 OSD workers use their existing local storage more efficiently
Acceptance tests¶
- AT1-1:
ssh osd sudo salt --no-color -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'df -h /var/lib/openqa'
shows > 1TB for all PRG2 workers
Suggestions¶
- Review what was done in #162719-15 manually, consider to change mount points using the salt states in https://gitlab.suse.de/openqa/salt-states-openqa/-/tree/master/openqa/nvme_store?ref_type=heads that prepare devices accordingly. If not feasible then apply the same either manually to all machines
Out of scope¶
Ordering any new physical storage devices
Updated by okurz 2 months ago
- Copied from action #162725: After w40 reconsider storage use for other OSD workers size:S added
Updated by gpathak 2 months ago · Edited
okurz wrote:
so a RAID0 is constructed between a 500GiB and a 6TiB device which does not make much sense to me. So after all I think the approach from dheidler in #162719 was not enough as it was not addressing the real problem.
Seems like we aim to move the root filesystem on all other workers to the smallest storage device (e.g. on w40 either nvme1n1
or nvme2n1
, both 477GB), while keeping /var/lib/openqa
on the largest available storage (either nvme2n1
or the combination of nvme1n1
and nvme0n1
). If this is correct, can we update/add something in the acceptance criteria for clarity?