Project

General

Profile

action #168244

Updated by okurz about 1 month ago

## Motivation 
 With #158125 we have a worker load limit which helps but there can still be cases like happened 2024-10-14 on mania with the load going way above the configured load limit. Regarding load15 and such I was thinking of looking at the combination of load values, e.g. only start jobs if `max(load1, load5, load15) < load_limit` 

 ## Acceptance criteria 
 * **AC1:** ppc workers consistently do not alert about too high load 
 * **AC2:** ppc worker instances numbers are unchanged 

 ## Suggestions 
 * Just looking at load15 brings the problem that if many jobs start within a short time the load is not yet high so the load limit is not always effective. If we would use `max(load1, load5, load15) < load_limit` then maybe load1 or load5 would already be higher. 
 * As alternative only start jobs if `max(load1, load5, load15) < load_limit || (load1 < load_limit && load1 < load5 && load5 < load15)`. This way with the first part of the condition we prevent overload when jobs are picked up within seconds/minutes one after another. And with the second part of the condition we allow jobs to be picked up when the load is declining. This way we can set the load limit lower without forcing the worker to be idle for too long.

Back