Project

General

Profile

Actions

coordination #157669

open

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

websockets+scheduler improvements to support more online worker instances

Added by okurz 9 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2023-08-31
Due date:
% Done:

37%

Estimated time:
(Total: 0.00 h)

Subtasks 8 (5 open3 closed)

action #134924: Websocket server overloaded, affected worker slots shown as "broken" with graceful disconnect in workers tableNew2023-08-31

Actions
action #157675: Optimize openqa-scheduler database queries, e.g. "SELECT value FROM worker_properties..."New2024-03-21

Actions
action #157681: Profiling using NYTProf for openqa-websockets and openqa-schedulerNew2024-03-21

Actions
action #157684: cycle execution health check in openqa-schedulerNew2024-03-21

Actions
action #157690: Simple global limit of registered/online workers size:MResolvedmkittler2024-03-21

Actions
openQA Infrastructure (public) - action #167557: OSD not starting new jobs on 2024-09-28 due to >1k worker instances connected, overloading websocket serverResolvedokurz2024-09-28

Actions
action #168178: Limit connected online workers based on websocket+scheduler load size:MWorkable

Actions
action #168502: Check for high websockets load on o3 2024-10-20Resolvedokurz2024-10-20

Actions

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure (public) - action #157666: OSD unresponsive and then not starting any more jobs on 2024-03-21Resolvedokurz2024-03-12

Actions
Related to openQA Infrastructure (public) - action #157726: osd-deployment | Failed pipeline for master (worker3[6-9].oqa.prg2.suse.org)Resolvedokurz2024-03-18

Actions
Actions #1

Updated by okurz 9 months ago

  • Subtask #134924 added
Actions #2

Updated by okurz 9 months ago

  • Subtask #157675 added
Actions #3

Updated by okurz 9 months ago

  • Subtask #157681 added
Actions #4

Updated by okurz 9 months ago

  • Subtask #157684 added
Actions #5

Updated by okurz 9 months ago

  • Related to action #157666: OSD unresponsive and then not starting any more jobs on 2024-03-21 added
Actions #6

Updated by okurz 9 months ago

  • Subtask #157690 added
Actions #7

Updated by okurz 9 months ago

  • Related to action #157726: osd-deployment | Failed pipeline for master (worker3[6-9].oqa.prg2.suse.org) added
Actions #8

Updated by okurz 3 months ago

  • Subtask #167557 added
Actions #9

Updated by okurz 3 months ago

  • Subject changed from websockets+scheduler improvements to websockets+scheduler improvements to support more online worker instances
Actions #10

Updated by okurz 2 months ago

  • Subtask #168178 added
Actions #11

Updated by okurz 2 months ago

  • Subtask #168502 added
Actions

Also available in: Atom PDF