Project

General

Profile

coordination #103944

[saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance

Added by okurz 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2021-08-06
Due date:
% Done:

25%

Estimated time:
(Total: 0.00 h)
Difficulty:

Subtasks

action #98463: [epic] Avoid too slow asset downloads leading to jobs exceeding the timeout with or run into auto_review:"(timeout: setup exceeded MAX_SETUP_TIME|Cache service queue already full)":retryNew

action #96623: Let workers declare themselves as broken if asset downloads are piling up size:MResolveddheidler

action #96684: Abort asset download via the cache service when related job runs into a timeout (or is otherwise cancelled)New

openQA Infrastructure - action #97409: Re-use existing filesystems on workers after reboot if possible to prevent full worker asset cache re-syncingNew

openQA Infrastructure - action #97412: Reduce I/O load on OSD by using more cache size on workers with using free disk space when available instead of hardcoded spaceNew

Also available in: Atom PDF