Actions
action #135773
closed[tools] many multi-machine test failures in "ovs-client+ovs-server" test scenario when tests are run across different workers size:M
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-08-15
Due date:
2023-10-07
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
See #134282-1
There is something wrong with multimachine network when tests are run across different workers. If is multimachine job forced to run on same worker, it is fine.
There are fails in core group: https://openqa.suse.de/tests/11843205#next_previous
Kernel group: https://openqa.suse.de/tests/11846943#next_previous
HPC: https://openqa.suse.de/tests/11845897#next_previous
The scenario is https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Updates&machine=64bit&test=ovs-client&version=15-SP5
Acceptance criteria¶
- AC1: The "ovs-client+ovs-server" test scenario passes consistently when running on multiple OSD workers with "tap" class
Suggestions¶
- Check for the current fail ratio of the scenario using https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation when running on
- a single physical host (as reference)
- multiple hosts
- Thoroughly read #134282-3
- Read https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.cookbook.mtu-mss.html and check if that is applicable for us
- For easier reproduction+investigation trigger openQA multi-machine clusters with PAUSE_AT, see https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc, e.g. after the systems boot and potentially configured their network or something
- Check for MTU size related problems, e.g. with
ping
using big packet sizes and explicit selections of bridge or tap devices
Out of scope¶
- Anything that already fails when the multi-machine cluster runs on a single physical host
- #135035 "Pin multimachine jobs to a single worker"
- Any other test than "ovs-client+server"
- Try to minimize the reproducer, e.g. skip test modules in openQA -> #135818
Workaround¶
Pin to a single physical machine
Actions