Project

General

Profile

Actions

action #129484

closed

openQA Project - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project - coordination #108209: [epic] Reduce load on OSD

high response times on osd - Move OSD workers to o3 to prevent OSD overload size:M

Added by okurz 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-05-17
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

OSD can be overloaded by too many jobs uploading too much data. We have more workers pending installation so the situation can become even worse. On the other side o3 sometimes runs into capacity limits when a new Tumbleweed snapshot, Leap and staging tests come together. So we should logically move some machines in NUE1-SRV1 from osd to o3.

Acceptance criteria

  • AC1: o3 has more workers than before e.g. 1-2 extra workers
  • AC2: wiki listing o3 workers and racktables is up-to-date

Suggestions


Related issues 2 (0 open2 closed)

Related to openQA Project - action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs)Resolvedmkittler2021-01-18

Actions
Copied to openQA Project - action #129487: high response times on osd - Limit the number of concurrent job upload handling on webUI side. Can we use a semaphore or lock using the database? size:MRejectedokurz

Actions
Actions

Also available in: Atom PDF