Actions
action #155929
closedopenQA Project (public) - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
openQA Project (public) - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers
Try out rstp_enable=True in openqa/openvswitch.sls size:M
Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Motivation¶
We have the theory that our multi-machine setup with GRE tunnels and STP cause problems like happened in #155716-8 possibly due to STP being too slow to adapt causing openQA tests to fail.
Acceptance criteria¶
- AC1: Temporary multi-machine test issues are prevented when worker hosts temporarily are unavailable
- AC2: RSTP does not break more than we had in before
- AC3: Our documentation and salt states are up-to-date regarding STP+RSTP
Suggestions¶
- Read https://pve.proxmox.com/wiki/Open_vSwitch#Rapid_Spanning_Tree_.28RSTP.29 and enable the setting via Salt
- Read https://www.accuenergy.com/support/reference-directory/rapid-spanning-tree-protocol-rstp/#:~:text=Rapid%20Spanning%20Tree%20Protocol%20(RSTP%3A%20IEEE%20802.1w)%20is,free%E2%80%9D%20topology%20within%20Ethernet%20networks.
- Do a simple ping test between VMs (using a cluster of at least 3 machines connected via GRE) when one of the GRE nodes disconnects and connects (see http://open.qa/docs/#_start_test_vms_manually)
- Try via the MM openQA-in-openQA test by simply changing https://github.com/os-autoinst/os-autoinst/blob/master/script/os-autoinst-setup-multi-machine#L50 and adapting the openQA-in-openQA test to use that os-autoinst version instead of the stable package
- Try to reproduce the test e.g. using https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-HA-Incidents&machine=64bit&test=qam_ha_hawk_haproxy_node02&version=15-SP2 by running this test near-continuous and then trigger a reboot of a machine which "ovs-appctl stp/show" shows to be crucial for the connection while the test is running
- Then enable rstp in the wicked hook scripts and possibly disable stp instead
- Reconduct the experiment and check if the above significantly prevents related problems
- If successful ensure that https://github.com/os-autoinst/os-autoinst/blob/master/script/os-autoinst-setup-multi-machine#L50 and salt-states are in sync and our config in http://open.qa/docs/
Actions