Project

General

Profile

action #158146

Updated by livdywan about 2 months ago

## Motivation 
 Multi-machine jobs have been failing since 20230814, because of a misconfiguration of the MTU/GRE tunnels. A workaround has been found in forcing the complete multi-machine tests to run in the same worker. In #135035 we added a feature flag to limit jobs to a single physical host which can be used for debugging or as temporary workaround or if the network design prevents multiple hosts to be interconnected by GRE tunnels. But by default when multi-machine jobs are scheduled with worker classes fulfilled by multiple hosts which might not be properly interconnected then there is no measure preventing workers to pick up such clusters causing hard to investigate openQA job failures which we should try to prevent. Can we propagate test variables like the "limit to one host only" feature flag in worker properties so that the openQA scheduler can see that flag before assigning to workers? 

 ## Acceptance Criteria 
 * **AC1:** the openQA scheduler does not schedule across-host multimachine clusters to any host that has the feature flag from #135035 set or like that feature flag (considering proposals in #157144-2) 
 * **AC2:** By default jobs of a multi-machine parallel cluster can still be scheduled covering multiple different hosts 

 ## Suggestions 
 * Look into what was done in #135035 but for the central openQA scheduler 
 * Investigate if any worker properties are already available to read by the openQA scheduler when scheduling. At least it knows about the worker class already, right? Should we translate the feature flag from #135035 as a "special worker class" to act as an exclusive class that is only implemented by one host at a time? 
 * Consider proposals in #157144-2 regarding using a special worker class *or* directly the flag from #135035 `PARALLEL_ONE_HOST_ONLY=1` 
 * Ensure that the scheduler does not schedule across-host multimachine clusters to any host that has such special worker class or worker property

Back