Project

General

Profile

Actions

action #160646

closed

openQA Project - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

openQA Project - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

multiple multi-machine test failures, no GRE tunnels are setup between machines anymore at all size:M

Added by okurz about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-05-21
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

Originally reported in https://suse.slack.com/archives/C02CANHLANP/p1716169544132569

(Richard Fan) Hello experts, many Multi-machine tests are failed like MM failed jobs on qe-core (edited)

After that as visible in https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1716124476616&to=1716212983933&viewPanel=24
2024-05-19 22:30 there is an increase of parallel_failed.

E.g.
openQA test in scenario sle-15-SP3-Server-DVD-Updates-x86_64-qam_kernel_multipath@64bit fails in
multipath_iscsi
shows that it's not a problem with MTU as the error message is "connect: Network is unreachable"

ssh worker29.oqa.prg2.suse.org "cat /etc/wicked/scripts/gre_tunnel_preup.sh" shows a problem

#!/bin/sh
action="$1"
bridge="$2"
# enable STP for the multihost bridges
ovs-vsctl set bridge $bridge stp_enable=false
ovs-vsctl set bridge $bridge rstp_enable=true
for gre_port in $(ovs-vsctl list-ifaces $bridge | grep gre) ; do ovs-vsctl --if-exists del-port $bridge $gre_port ; done

there should be a list of GRE tunnel interface setup calls after the last line between those machines but no GRE tunnels are setup at all

Reproducible

Fails since
https://openqa.suse.de/tests/overview?result=parallel_failed&distri=sle&version=15-SP4&build=20240519-1

Expected result

Last good: 20240519-1 (or more recent)

Suggestions

Rollback steps

Further details

Always latest result in the originally mentioned scenario: latest


Related issues 2 (1 open1 closed)

Related to openQA Infrastructure - action #161381: multi-machine test network issues reported 2024-06-03 due to missing content in the salt mine size:SResolvedmkittler2024-06-032024-06-18

Actions
Copied to openQA Infrastructure - action #160826: Optimize gre_tunnel_preup.sh generation jinja templateNew2024-05-21

Actions
Actions

Also available in: Atom PDF