action #162725: After w40 reconsider storage use for other OSD workers size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #162725

closed

coordination #162716: [epic] Better use of storage on OSD workers

After w40 reconsider storage use for other OSD workers size:S

Added by okurz 11 months ago. Updated 8 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

gpathak

Category:

Feature requests

Target version:

openQA Project (public) - Ready

Start date:

2024-06-21

Due date:

% Done:

Estimated time:

Tags:

infra

Description

Motivation¶

See #162719

Acceptance criteria¶

AC1: All PRG2 x86_64 OSD workers have significantly more space than 500G for pool+cache combined using the existing physical storage devices

Acceptance tests¶

AT1-1: ssh osd sudo salt --no-color -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'df -h /var/lib/openqa' shows > 500G for all PRG2 workers

Suggestions¶

Review what was done in #162719-15 manually, consider to change mount points using the salt states in https://gitlab.suse.de/openqa/salt-states-openqa/-/tree/master/openqa/nvme_store?ref_type=heads that prepare devices accordingly. If not feasible then apply the same either manually to all machines

Out of scope¶

Ordering any new physical storage devices

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by okurz 11 months ago

Copied from action #162719: Ensure w40 has more space for worker pool directories size:S added

Actions

Copy link

Updated by okurz 11 months ago

Target version changed from future to Tools - Next

Actions

Copy link

Updated by okurz 10 months ago

Subject changed from After w40 reconsider storage use for other OSD workers to After w40 reconsider storage use for other OSD workers size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz 8 months ago

Target version changed from Tools - Next to Ready

Actions

Copy link

Updated by gpathak 8 months ago

Seems like the AC and AT is already fulfilled:

openqaworker17.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      3.5T  617G  2.7T  19% /var/lib/openqa
openqaworker18.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      3.5T  618G  2.7T  19% /var/lib/openqa
openqaworker16.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      3.5T  611G  2.7T  19% /var/lib/openqa
worker36.oqa.prg2.suse.org:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   57G  835G   7% /var/lib/openqa
worker39.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   55G  837G   7% /var/lib/openqa
worker32.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   55G  836G   7% /var/lib/openqa
worker30.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   62G  830G   7% /var/lib/openqa
qesapworker-prg5.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   65G   14T   1% /var/lib/openqa
worker33.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   62G  830G   7% /var/lib/openqa
worker31.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   65G  827G   8% /var/lib/openqa
worker29.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   79G  813G   9% /var/lib/openqa
qesapworker-prg4.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   66G   14T   1% /var/lib/openqa
openqaworker14.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      2.5T  582G  1.8T  25% /var/lib/openqa
qesapworker-prg7.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   66G   14T   1% /var/lib/openqa
worker40.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      6.3T   55G  5.9T   1% /var/lib/openqa
worker35.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   59G  832G   7% /var/lib/openqa
worker34.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   61G  830G   7% /var/lib/openqa
qesapworker-prg6.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   59G   14T   1% /var/lib/openqa
sapworker1.qe.nue2.suse.org:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   62G   14T   1% /var/lib/openqa

The above output is from the command sudo salt --no-color -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'df -h /var/lib/openqa' executed on OSD

Actions

Copy link

Updated by okurz 8 months ago

Copied to action #168301: After w40 related problems reconsider storage use for all PRG2 based OSD workers added

Actions

Copy link

Updated by okurz 8 months ago

Good check! Yeah, that's true. It seems what was overlooked as part of #162719 is that w40 probably temporarily lost the connection to one of it's NVMes. Right now lsblk shows

NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
nvme1n1     259:0    0 476.9G  0 disk  
└─md127       9:127  0   6.3T  0 raid0 /var/lib/openqa
nvme2n1     259:1    0 476.9G  0 disk  
├─nvme2n1p1 259:2    0   512M  0 part  /boot/efi
├─nvme2n1p2 259:3    0   293G  0 part  /
…
nvme0n1     259:6    0   5.8T  0 disk  
└─md127       9:127  0   6.3T  0 raid0 /var/lib/openqa

so a RAID0 is constructed between a 500GiB and a 6TiB device which does not make much sense to me. So after all I think the approach from dheidler in #162719 was not enough as it was not addressing the real problem. However that we should handle in a separate dedicated ticket which I now did with #168301. Feel welcome to pick up and resolve this ticket then as you did all what was necessary :)

Actions

Copy link

Updated by gpathak 8 months ago

Status changed from Workable to In Progress
Assignee set to gpathak

Actions

Copy link

Updated by gpathak 8 months ago

Status changed from In Progress to Resolved

This ticket is resolved per AC and AT

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #162725

After w40 reconsider storage use for other OSD workers size:S

Motivation¶

Acceptance criteria¶

Acceptance tests¶

Suggestions¶

Out of scope¶

Updated by okurz 11 months ago

Updated by okurz 11 months ago

Updated by okurz 10 months ago

Updated by okurz 8 months ago

Updated by gpathak 8 months ago

Updated by okurz 8 months ago

Updated by okurz 8 months ago

Updated by gpathak 8 months ago

Updated by gpathak 8 months ago