Project

General

Profile

Actions

action #97409

open

openQA Project (public) - coordination #103944: [saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance

openQA Project (public) - coordination #98463: [epic] Avoid too slow asset downloads leading to jobs exceeding the timeout with or run into auto_review:"(timeout: setup exceeded MAX_SETUP_TIME|Cache service queue already full)":retry

Re-use existing filesystems on workers after reboot if possible to prevent full worker asset cache re-syncing

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
QA (public, currently private due to #173521) - future
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #96554 . When workers reboot for many the openQA pool is completely re-initialized on a new filesystem (ext2) which means that as soon as openQA jobs start all need to sync the cache completely from OSD again. If many workers do that this can quite some delay and maybe even I/O overload. We should try to re-use the filesystem along with the cache if possible, meaning not ext2 but probably ext4.


Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #96554: Mitigate on-going disk I/O alerts size:MResolvedmkittler2021-08-04

Actions
Actions

Also available in: Atom PDF