action #135200
closed[qe-core] Implement a ping check with custom MTU packet size
0%
Description
Motivation¶
Every SUT has to have the correct MTU for GRE tunnels to work. Currently we have no tests covering this. Jobs fail after seemingly working setups have been asserted but there is not validation of the network.
Acceptance criteria¶
- AC1: Multi-machine packet passing is asserted in a test
Suggestions¶
Updated by livdywan over 1 year ago
- Copied from action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retry added
Updated by okurz over 1 year ago
- Tags deleted (
infra) - Project changed from openQA Infrastructure (public) to openQA Tests (public)
- Target version deleted (
Ready)
In tests, not in infra
Updated by szarate over 1 year ago
- Sprint set to QE-Core: August Sprint 23 (Aug 09 - Sep 04)
- Tags set to qe-core-september-sprint
- Subject changed from Implement a ping check with custom MTU packet size to [qe-core] Implement a ping check with custom MTU packet size
- Category set to Enhancement to existing tests
Will move to urgent once we kick off the sprint tomorrow
Updated by szarate over 1 year ago
- Sprint changed from QE-Core: August Sprint 23 (Aug 09 - Sep 04) to QE-Core: September Sprint 23 (Sep 06 - Oct 04)
Updated by szarate over 1 year ago
Where we have jumbo frames, they are on the uplinks between our devices, not facing the users.
I don't know about GRE tunnels using openvswitch, but it should be easy to test. One is the tracepath command, the other one is ping, as in:
ping -M do -c 1 -s 1500 <destination_IP>
-M do = prohibit fragmentation
decrease 1500 by 10 until ping works and you'll find the lowest MTU, or if MTU issues.
Updated by szarate over 1 year ago
- Blocks action #131189: [qe-core] Introduce firewalld container test in ALP added
Updated by mkittler over 1 year ago
Verifying the connectivity explicitly in tests sounds useful to distinguish real problems from issues with services that are just temporarily not reachable.
I just come to this ticket because I was pointed at a cluster of failing jobs. These jobs already log the mtu, e.g. see step https://openqa.suse.de/tests/12118392#step/before_test/49:
1: [36mlo: [0m<LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback [33m00:00:00:00:00:00[0m brd [33m00:00:00:00:00:00[0m
inet [35m127.0.0.1[0m/8 scope host lo
valid_lft forever preferred_lft forever
inet6 [34m::1[0m/128 scope host
valid_lft forever preferred_lft forever
2: [36meth0: [0m<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc pfifo_fast state [32mUP [0mgroup default qlen 1000
link/ether [33m52:54:00:12:0c:20[0m brd [33mff:ff:ff:ff:ff:ff[0m
altname enp0s4
altname ens4
inet [35m10.0.2.15[0m/24 brd [35m10.0.2.255 [0mscope global eth0
valid_lft forever preferred_lft forever
inet6 [34mfe80::5054:ff:fe12:c20[0m/64 scope link
valid_lft forever preferred_lft forever
All the other jobs in the cluster also have an mtu of 1458. Not sure whether that value is to be trusted, though. Supposedly it is even wrong (considering our issues) and one had to to a check like in #135200#note-6 to find out what works for real. Then one could try to set that via e.g. ifconfig eth0 mtu …
and maybe that'll even help.
Updated by pcervinka about 1 year ago
- Is duplicate of action #135818: [kernel] minimal reproducer for many multi-machine test failures in "ovs-client+ovs-server" test scenario when tests are run across different workers added
Updated by pcervinka about 1 year ago
- Status changed from Workable to Resolved
Tests are stable. I guess there is nothing pending to do within this ticket.
Updated by szarate about 1 year ago
- Status changed from Resolved to Rejected
- Assignee set to dvenkatachala