Project

General

Profile

Actions

action #135818

closed

[kernel] minimal reproducer for many multi-machine test failures in "ovs-client+ovs-server" test scenario when tests are run across different workers

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2023-08-15
Due date:
% Done:

0%

Estimated time:
Difficulty:
Tags:

Description

[tools] many multi-machine test failures in "ovs-client+ovs-server" test scenario when tests are run across different workers

Observation

See #134282-1

There is something wrong with multimachine network when tests are run across different workers. If is multimachine job forced to run on same worker, it is fine.

There are fails in core group: https://openqa.suse.de/tests/11843205#next_previous
Kernel group: https://openqa.suse.de/tests/11846943#next_previous
HPC: https://openqa.suse.de/tests/11845897#next_previous

The scenario is https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Updates&machine=64bit&test=ovs-client&version=15-SP5

To be able to work more efficiently on #135773 it would be very helpful to have a minimum reproducer openQA test scenario.

Suggestions

  • Try to minimize the reproducer, e.g. skip test modules in openQA

Related issues 3 (0 open3 closed)

Related to openQA Tests - action #136136: [qe-core] jobs scheduled for currently not available worker31 "ovs-server+client"Resolveddzedro2023-09-20

Actions
Has duplicate openQA Tests - action #135200: [qe-core] Implement a ping check with custom MTU packet sizeRejecteddvenkatachala2023-08-15

Actions
Copied from openQA Infrastructure - action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retryResolvednicksinger2023-08-15

Actions
Actions #1

Updated by okurz about 1 year ago

  • Copied from action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retry added
Actions #2

Updated by pcervinka about 1 year ago

We don't own ovs-client+ovs-server but you can use https://openqa.suse.de/tests/12135353# as minimal reproducer.

Actions #3

Updated by pcervinka about 1 year ago

  • Assignee set to pcervinka

I will prepare simple ping scenario next week.

Actions #5

Updated by pcervinka about 1 year ago

We agreed on the call that i will extend existing miltipath mm test with ping(different mtu sizes).

Actions #6

Updated by pcervinka about 1 year ago

  • Status changed from New to In Progress
Actions #9

Updated by pcervinka about 1 year ago

  • Has duplicate action #135200: [qe-core] Implement a ping check with custom MTU packet size added
Actions #10

Updated by okurz about 1 year ago

  • Related to action #136136: [qe-core] jobs scheduled for currently not available worker31 "ovs-server+client" added
Actions #12

Updated by pcervinka about 1 year ago

  • Status changed from In Progress to Feedback
  • Priority changed from High to Normal

PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17817 was merged, let's see how it will behave.

Actions #13

Updated by pcervinka about 1 year ago

Works fine https://openqa.suse.de/tests/12224267/logfile?filename=serial_terminal.txt.
Test was scheduled across workers worker40 and worker37.

Actions #14

Updated by pcervinka about 1 year ago

Let's wait for tomorrow for next daily jobs.

Actions #15

Updated by pcervinka about 1 year ago

All mm jobs were fine in kernel job group today (nothing to troubleshoot).

Actions #16

Updated by pcervinka about 1 year ago

  • Status changed from Feedback to Resolved

Tests are stable. I guess there is nothing pending to do within this ticket.

Actions

Also available in: Atom PDF