Project

General

Profile

action #97409

openQA Project - coordination #103944: [saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance

openQA Project - action #98463: [epic] Avoid too slow asset downloads leading to jobs exceeding the timeout with or run into auto_review:"(timeout: setup exceeded MAX_SETUP_TIME|Cache service queue already full)":retry

Re-use existing filesystems on workers after reboot if possible to prevent full worker asset cache re-syncing

Added by okurz 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See #96554 . When workers reboot for many the openQA pool is completely re-initialized on a new filesystem (ext2) which means that as soon as openQA jobs start all need to sync the cache completely from OSD again. If many workers do that this can quite some delay and maybe even I/O overload. We should try to re-use the filesystem along with the cache if possible, meaning not ext2 but probably ext4.


Related issues

Copied from openQA Infrastructure - action #96554: Mitigate on-going disk I/O alerts size:MResolved2021-08-04

History

#1 Updated by okurz 5 months ago

  • Copied from action #96554: Mitigate on-going disk I/O alerts size:M added

#2 Updated by mkittler 5 months ago

  • Parent task changed from #96447 to #98463

#96447 hasn't a very meaningful ticket description, so I'm replacing the parent ticket with #98463.

Also available in: Atom PDF