action #112001
closedcoordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers
action #111908: Multimachine failures between multiple physical workers
[timeboxed:20h][spike solution] Pin multi-machine cluster jobs to same openQA worker host based on configuration
Description
Motivation¶
In general openQA supports multi-machine clusters with jobs running on different physical hosts, e.g. qemu VMs on two different physical hosts connected by GRE tunnels. For often unknown reasons such combinations might not provide stable testing results. To mitigate we should provide means to ask the openQA scheduler to restrict the scheduling of jobs that are parts of multi-machine clusters to end up only one a single physical host if configured accordingly.
Acceptance criteria¶
- AC1: Given multiple worker hosts matching the same worker class When multi-machine parallel cluster jobs are scheduled And configured to restrict to a single worker host Then parallel jobs are all assigned to the same worker host
- AC2: By default multi-machine cluster jobs are still scheduled on multiple physical hosts
Workaround¶
If you run into problems with parallel jobs that are running on multiple worker hosts then ensure to select a worker class that only includes a single worker host at a time.
Updated by okurz 5 months ago
- Related to action #135035: Optionally restrict multimachine jobs to a single worker added
Updated by okurz 5 months ago
- Related to action #158146: Prevent scheduling across-host multimachine clusters to hosts that are marked to exclude themselves size:M added
Updated by okurz 5 months ago
- Related to action #158143: Make workers unassign/reject/incomplete jobs when across-host multimachine setup is requested but not available added