Project

General

Profile

Actions

action #135200

closed

[qe-core] Implement a ping check with custom MTU packet size

Added by livdywan over 1 year ago. Updated about 1 year ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
Enhancement to existing tests
Target version:
-
Start date:
2023-08-15
Due date:
% Done:

0%

Estimated time:
Difficulty:
Sprint:
QE-Core: September Sprint 23 (Sep 06 - Oct 04)

Description

Motivation

Every SUT has to have the correct MTU for GRE tunnels to work. Currently we have no tests covering this. Jobs fail after seemingly working setups have been asserted but there is not validation of the network.

Acceptance criteria

  • AC1: Multi-machine packet passing is asserted in a test

Suggestions


Related issues 3 (0 open3 closed)

Is duplicate of openQA Tests (public) - action #135818: [kernel] minimal reproducer for many multi-machine test failures in "ovs-client+ovs-server" test scenario when tests are run across different workersResolvedpcervinka2023-08-15

Actions
Blocks openQA Tests (public) - action #131189: [qe-core] Introduce firewalld container test in ALPResolvedamanzini2023-06-21

Actions
Copied from openQA Infrastructure (public) - action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retryResolvednicksinger2023-08-15

Actions
Actions #1

Updated by livdywan over 1 year ago

  • Copied from action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retry added
Actions #2

Updated by okurz over 1 year ago

  • Tags deleted (infra)
  • Project changed from openQA Infrastructure (public) to openQA Tests (public)
  • Target version deleted (Ready)

In tests, not in infra

Actions #3

Updated by szarate over 1 year ago

  • Sprint set to QE-Core: August Sprint 23 (Aug 09 - Sep 04)
  • Tags set to qe-core-september-sprint
  • Subject changed from Implement a ping check with custom MTU packet size to [qe-core] Implement a ping check with custom MTU packet size
  • Category set to Enhancement to existing tests

Will move to urgent once we kick off the sprint tomorrow

Actions #4

Updated by szarate over 1 year ago

  • Sprint changed from QE-Core: August Sprint 23 (Aug 09 - Sep 04) to QE-Core: September Sprint 23 (Sep 06 - Oct 04)
Actions #5

Updated by szarate over 1 year ago

  • Status changed from New to Workable
Actions #6

Updated by szarate over 1 year ago

Where we have jumbo frames, they are on the uplinks between our devices, not facing the users.
I don't know about GRE tunnels using openvswitch, but it should be easy to test. One is the tracepath command, the other one is ping, as in:

ping -M do -c 1 -s 1500 <destination_IP>

-M do = prohibit fragmentation
decrease 1500 by 10 until ping works and you'll find the lowest MTU, or if MTU issues.

Actions #7

Updated by szarate over 1 year ago

  • Blocks action #131189: [qe-core] Introduce firewalld container test in ALP added
Actions #8

Updated by mkittler over 1 year ago

Verifying the connectivity explicitly in tests sounds useful to distinguish real problems from issues with services that are just temporarily not reachable.

I just come to this ticket because I was pointed at a cluster of failing jobs. These jobs already log the mtu, e.g. see step https://openqa.suse.de/tests/12118392#step/before_test/49:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:12:0c:20 brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    altname ens4
    inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe12:c20/64 scope link 
       valid_lft forever preferred_lft forever

All the other jobs in the cluster also have an mtu of 1458. Not sure whether that value is to be trusted, though. Supposedly it is even wrong (considering our issues) and one had to to a check like in #135200#note-6 to find out what works for real. Then one could try to set that via e.g. ifconfig eth0 mtu … and maybe that'll even help.

Actions #9

Updated by pcervinka about 1 year ago

  • Is duplicate of action #135818: [kernel] minimal reproducer for many multi-machine test failures in "ovs-client+ovs-server" test scenario when tests are run across different workers added
Actions #10

Updated by pcervinka about 1 year ago

  • Status changed from Workable to Resolved

Tests are stable. I guess there is nothing pending to do within this ticket.

Actions #11

Updated by szarate about 1 year ago

  • Status changed from Resolved to Rejected
  • Assignee set to dvenkatachala
Actions

Also available in: Atom PDF