Project

General

Profile

Actions

coordination #103944

open

[saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance

Added by okurz over 2 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2021-08-06
Due date:
2023-05-10 (about 14 months late)
% Done:

85%

Estimated time:
(Total: 0.00 h)

Subtasks 10 (4 open6 closed)

coordination #98463: [epic] Avoid too slow asset downloads leading to jobs exceeding the timeout with or run into auto_review:"(timeout: setup exceeded MAX_SETUP_TIME|Cache service queue already full)":retryBlockedkraih2021-08-062023-05-10

Actions
action #96623: Let workers declare themselves as broken if asset downloads are piling up size:MResolveddheidler2021-08-06

Actions
action #96684: Abort asset download via the cache service when related job runs into a timeout (or is otherwise cancelled) size:MRejectedmkittler2021-08-09

Actions
openQA Infrastructure - action #97409: Re-use existing filesystems on workers after reboot if possible to prevent full worker asset cache re-syncingNew

Actions
openQA Infrastructure - action #97412: Reduce I/O load on OSD by using more cache size on workers with using free disk space when available instead of hardcoded spaceNew

Actions
action #125276: Ensure that the incomplete jobs with "cache service full" are properly restarted size:MResolvedmkittler2023-03-02

Actions
action #128267: Restarting jobs (e.g. due to full cache queue) can lead to weird behavior for certain job dependencies (was: Ensure that the incomplete jobs with "cache service full" are properly restarted (take 2)) size:MResolvedmkittler2023-05-10

Actions
action #128276: Handle workers with busy cache service gracefully by a two-level wait size:MResolvedmkittler2023-04-25

Actions
coordination #157144: [epic] Groups of worker classes: Regions, locations, etc.New2024-03-13

Actions
action #157147: Documentation for OSD worker region, location, datacenter keys in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls size:SResolvedmkittler2024-03-13

Actions
Actions #1

Updated by okurz 3 months ago

  • Subtask #157144 added
Actions

Also available in: Atom PDF