[tools][scheduler] Multi-machine jobs with higher priority do not get worker to run.
During 15sp1 beta1 test, the multi-machine jobs(2 sut) in virtualization job groups can not get workers to kickoff job until some low priority single machine jobs finish. This delays especially acceptance test(not able to finish within 24 hours).
I did not open a ticket when I found it, because I understood that:
although they were with higher priority in our group, but possibly other job group ipmi jobs have even higher priority. So they got the machine first, and they did not finish at the same time, so our multi-machine jobs still could not be started and other lower priority single machine jobs started.
okurz commented that openqa tool should make some enhancement for it in https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/148/.
Please help to evaluate. Really appreciate.
#1 Updated by coolo over 3 years ago
- Status changed from New to Rejected
Priority 30 (your group) means (roughly) 30 chances need to pass for a job to be taken without another peer. We won't leave IPMI machines stale because of a prio 30 job around. If you want your jobs to be take-it-all, you would need to set the prio to much lower.
#3 Updated by xlai over 3 years ago
on a second thought: do you have more infos about the other jobs' priority? One thing we could improve is how much impact the priority difference has. I don't think we care atm.
I am not so sure. I checked the job history of the workers with class virt-mm-64bit-ipmi, seems most jobs launched before them were in virtualization group, some with same priority 30, while some with priority 50(single machine jobs with the lowest job priority in our group, please note that virtualization-milestone job group also has some multi-machine jobs with priority 40 which were scheduled nearly after all other single machine prio 50 jobs done).
#6 Updated by xlai over 3 years ago
After changing the priority to 20, the guest migration jobs always got SUT to run in recent beta2 candidates. So close the original MR https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/148/.
Also big thanks for the quick fix.