Project

General

Profile

Actions

action #181784

closed

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

coordination #157669: websockets+scheduler improvements to support more online worker instances

Improve scalability of openQA to be able to connect more worker slots

Added by mkittler 19 days ago. Updated 19 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2025-05-05
Due date:
% Done:

0%

Estimated time:

Description

When working on #168178 I noticed that two improvements can be made:

  • Improve scalability by only sending worker status on ws server ack
  • Extend connection limit of ws server to handle more workers

I did those changes (see https://github.com/os-autoinst/openQA/pull/6358) and tested them on OSD. They seemed to have helped to some extend as OSD was still operational for a few hours with over 1000 workers connected. (When I remember correctly it was 1038 worker slots.)

This means the main reason why we previously noticed a rather hard limit on the number of maximum online workers was really just the default limit for those connection by upstream Mojolicious. The change about status messages might also have helped a little bit.

I created this ticket to track the progress we made as I was not really working on the ACs of #168178.

Actions #1

Updated by mkittler 19 days ago

  • Description updated (diff)
Actions

Also available in: Atom PDF