action #135773

Updated by okurz 10 months ago

## Observation 
 See #134282-1 
 > There is something wrong with multimachine network when tests are run across different workers. If is multimachine job forced to run on same worker, it is fine. 

 > There are fails in core group: 
 Kernel group: 

 The scenario is 

 ## Acceptance criteria 
 * **AC1:** The "ovs-client+ovs-server" test scenario passes consistently when running on multiple OSD workers with "tap" class 

 ## Suggestions 
 * Check for the current fail ratio of the scenario using when running on 
   * a *single* physical host (as reference) 
   * multiple hosts 
 * Thoroughly read #134282-3 
 * Read and check if that is applicable for us 
 * For easier reproduction+investigation trigger openQA multi-machine clusters with PAUSE_AT, see, e.g. after the systems boot and potentially configured their network or something 
 * Check for MTU size related problems, e.g. with `ping` using big packet sizes and explicit selections of bridge or tap devices 

 ## Out of scope 
 * Anything that already fails when the multi-machine cluster runs on a single physical host 
 * #135035 "Pin multimachine jobs to a single worker" 
 * Any other test than "ovs-client+server" 
 * Try to minimize the reproducer, e.g. skip test modules in openQA -> #135818 

 ## Workaround 
 Pin to a single physical machine