action #168178
Updated by mkittler 1 day ago
## Motivation
With #157690 the amount of connected online workers is already limited based on a configuration variable. We can extend that to limit based on the actual websocket+scheduler load meaning to keep the number low enough to ensure proper operation of websocket+scheduler to prevent problems like #157666.
## Acceptance criteria
* **AC1:** A clear definition of "websocket+scheduler load" exists
* **AC2:** The number of online workers is limited to `min(configured_number,configured_load_limit)`
* **AC3:** Rejected openQA workers exceeding the mentioned limit(s) explicitly log or fail that situation
## Suggestions
* Look into the implemention of #157690 to see how the simple limit was implemented so far
* Come up with a definition of the critical websocket+scheduler load based on "overload experiments" which can be used as a metric for the problem seen in #157666
* Extend the simple limit with a lookup of the said metric and also prevent additional worker connections based on the metric
* Also consider disconnecting already connected workers if the metric exceeds the configured threshold
## Rollback steps
* Ensure `sapworker2.qe.nue2.suse.org` is powered down as is/was used when working on this ticket to create many workers.
Back