action #181784
closedcoordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
coordination #157669: websockets+scheduler improvements to support more online worker instances
Improve scalability of openQA to be able to connect more worker slots
Description
When working on #168178 I noticed that two improvements can be made:
- Improve scalability by only sending worker status on ws server ack
- Extend connection limit of ws server to handle more workers
I did those changes (see https://github.com/os-autoinst/openQA/pull/6358) and tested them on OSD. They seemed to have helped to some extend as OSD was still operational for a few hours with over 1000 workers connected. (When I remember correctly it was 1038 worker slots.)
This means the main reason why we previously noticed a rather hard limit on the number of maximum online workers was really just the default limit for those connection by upstream Mojolicious. The change about status messages might also have helped a little bit.
I created this ticket to track the progress we made as I was not really working on the ACs of #168178.