Project

General

Profile

Actions

action #167557

closed

openQA Project - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project - coordination #157669: websockets+scheduler improvements to support more online worker instances

OSD not starting new jobs on 2024-09-28 due to >1k worker instances connected, overloading websocket server

Added by okurz about 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-09-28
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests shows right now on 2024-09-28 11:07Z only 1 job running, 4k scheduled, no new are picked up for execution. https://openqa.suse.de/admin/workers shows that we have 1003 worker instances online which might lead to overload, see #110833.


Related issues 3 (0 open3 closed)

Related to openQA Infrastructure - action #167164: osd-deployment | Minions returned with non-zero exit code (qesapworker-prg5.qa.suse.cz) size:MResolvedybonatakis

Actions
Related to openQA Project - action #157690: Simple global limit of registered/online workers size:MResolvedmkittler2024-03-21

Actions
Related to openQA Infrastructure - action #157666: OSD unresponsive and then not starting any more jobs on 2024-03-21Resolvedokurz2024-03-12

Actions
Actions

Also available in: Atom PDF