action #162725: After w40 reconsider storage use for other OSD workers size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Custom queries

openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

action #162725

closed

coordination #162716: [epic] Better use of storage on OSD workers

After w40 reconsider storage use for other OSD workers size:S

Added by okurz 6 months ago. Updated 2 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

gpathak

Category:

Feature requests

Target version:

openQA Project (public) - Ready

Start date:

2024-06-21

Due date:

% Done:

Estimated time:

Tags:

infra

Description

Motivation¶

See #162719

Acceptance criteria¶

AC1: All PRG2 x86_64 OSD workers have significantly more space than 500G for pool+cache combined using the existing physical storage devices

Acceptance tests¶

AT1-1: ssh osd sudo salt --no-color -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'df -h /var/lib/openqa' shows > 500G for all PRG2 workers

Suggestions¶

Review what was done in #162719-15 manually, consider to change mount points using the salt states in https://gitlab.suse.de/openqa/salt-states-openqa/-/tree/master/openqa/nvme_store?ref_type=heads that prepare devices accordingly. If not feasible then apply the same either manually to all machines

Out of scope¶

Ordering any new physical storage devices

Related issues 2 (1 open — 1 closed)

Copied from openQA Infrastructure (public) - action #162719: Ensure w40 has more space for worker pool directories size:S

Resolved

dheidler

2024-06-21

Actions

Copied to openQA Infrastructure (public) - action #168301: After w40 related problems reconsider storage use for all PRG2 based OSD workers

New

2024-06-21

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by okurz 6 months ago

Copied from action #162719: Ensure w40 has more space for worker pool directories size:S added

Actions

Copy link

Updated by okurz 5 months ago

Target version changed from future to Tools - Next

Actions

Copy link

Updated by okurz 4 months ago

Subject changed from After w40 reconsider storage use for other OSD workers to After w40 reconsider storage use for other OSD workers size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz 2 months ago

Target version changed from Tools - Next to Ready

Actions

Copy link

Updated by gpathak 2 months ago

Seems like the AC and AT is already fulfilled:

openqaworker17.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      3.5T  617G  2.7T  19% /var/lib/openqa
openqaworker18.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      3.5T  618G  2.7T  19% /var/lib/openqa
openqaworker16.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      3.5T  611G  2.7T  19% /var/lib/openqa
worker36.oqa.prg2.suse.org:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   57G  835G   7% /var/lib/openqa
worker39.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   55G  837G   7% /var/lib/openqa
worker32.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   55G  836G   7% /var/lib/openqa
worker30.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   62G  830G   7% /var/lib/openqa
qesapworker-prg5.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   65G   14T   1% /var/lib/openqa
worker33.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   62G  830G   7% /var/lib/openqa
worker31.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   65G  827G   8% /var/lib/openqa
worker29.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   79G  813G   9% /var/lib/openqa
qesapworker-prg4.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   66G   14T   1% /var/lib/openqa
openqaworker14.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      2.5T  582G  1.8T  25% /var/lib/openqa
qesapworker-prg7.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   66G   14T   1% /var/lib/openqa
worker40.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      6.3T   55G  5.9T   1% /var/lib/openqa
worker35.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   59G  832G   7% /var/lib/openqa
worker34.oqa.prg2.suse.org:  <--
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127      939G   61G  830G   7% /var/lib/openqa
qesapworker-prg6.qa.suse.cz:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   59G   14T   1% /var/lib/openqa
sapworker1.qe.nue2.suse.org:
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/md127       14T   62G   14T   1% /var/lib/openqa

The above output is from the command sudo salt --no-color -C 'G@roles:worker and G@osarch:x86_64' cmd.run 'df -h /var/lib/openqa' executed on OSD

Actions

Copy link

Updated by okurz 2 months ago

Copied to action #168301: After w40 related problems reconsider storage use for all PRG2 based OSD workers added

Actions

Copy link

Updated by okurz 2 months ago

Good check! Yeah, that's true. It seems what was overlooked as part of #162719 is that w40 probably temporarily lost the connection to one of it's NVMes. Right now lsblk shows

NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
nvme1n1     259:0    0 476.9G  0 disk  
└─md127       9:127  0   6.3T  0 raid0 /var/lib/openqa
nvme2n1     259:1    0 476.9G  0 disk  
├─nvme2n1p1 259:2    0   512M  0 part  /boot/efi
├─nvme2n1p2 259:3    0   293G  0 part  /
…
nvme0n1     259:6    0   5.8T  0 disk  
└─md127       9:127  0   6.3T  0 raid0 /var/lib/openqa

so a RAID0 is constructed between a 500GiB and a 6TiB device which does not make much sense to me. So after all I think the approach from dheidler in #162719 was not enough as it was not addressing the real problem. However that we should handle in a separate dedicated ticket which I now did with #168301. Feel welcome to pick up and resolve this ticket then as you did all what was necessary :)

Actions

Copy link

Updated by gpathak 2 months ago

Status changed from Workable to In Progress
Assignee set to gpathak

Actions

Copy link

Updated by gpathak 2 months ago

Status changed from In Progress to Resolved

This ticket is resolved per AC and AT

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #162725

After w40 reconsider storage use for other OSD workers size:S

Motivation¶

Acceptance criteria¶

Acceptance tests¶

Suggestions¶

Out of scope¶

Updated by okurz 6 months ago

Updated by okurz 5 months ago

Updated by okurz 4 months ago

Updated by okurz 2 months ago

Updated by gpathak 2 months ago

Updated by okurz 2 months ago

Updated by okurz 2 months ago

Updated by gpathak 2 months ago

Updated by gpathak 2 months ago