Single Machine jobs starve clusters
As the scheduler tries to fill all available slots, the workers that can run
multi machine and single machine jobs will be filled with single machine jobs
as the clusters don't fit. The only way that clusters fit is if 4 jobs finish
within one scheduling round.
This is a tricky problem in general, but it has been solved before :)
#1 Updated by coolo over 1 year ago
The general idea is: whenver a job would be scheduled according to priority - but can't be scheduled due to cluster dependency, we increase a counter (or decrease the priority).
Once that counter reached a limit (or the priority turned 0), we reserve a worker slot for the job - and just won't allocate it until we have the full cluster.