Project

General

Profile

Actions

action #54074

closed

network tap / bridge setup is broken on some workers ( openqaworker7 )

Added by asmorodskyi almost 5 years ago. Updated over 4 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
2019-07-10
Due date:
% Done:

0%

Estimated time:

Description

in some workers I see such error message in os-autoinst log :

Failed to run dbus command 'set_vlan' with arguments 'tap13 2' : 'tap13' is not connected to bridge 'br1'

during test run this cause broken network ( ping not working , no DNS name resolution )

https://openqa.suse.de/tests/3043543/file/autoinst-log.txt


Related issues 1 (0 open1 closed)

Is duplicate of openQA Tests - action #54632: [tools] openqaworker7 is causing failuresRejected2019-07-25

Actions
Actions #2

Updated by acarvajal almost 5 years ago

Also seeing many MM network problems with openqaworker7 in the HA group for a week or so.

Initially thought the problem was also present in openqaworker5, but haven't seen failed jobs with openqaworker5 due to network issues since the openqa.suse.de upgrade from July 9th/10th.

Failed tests sample:

1) https://openqa.suse.de/tests/3055962#step/setup/25 ([debug] Failed to run dbus command 'set_vlan' with arguments 'tap19 7' : 'tap19' is not connected to bridge 'br1')

2) https://openqa.suse.de/tests/3071752#step/setup/25 ([debug] Failed to run dbus command 'set_vlan' with arguments 'tap15 3' : 'tap15' is not connected to bridge 'br1')

3) https://openqa.suse.de/tests/3054586#step/upgrade_from_sle11sp4_workarounds/16 ([debug] Failed to run dbus command 'set_vlan' with arguments 'tap14 4' : 'tap14' is not connected to bridge 'br1')

4) https://openqa.suse.de/tests/3071747#step/upgrade_from_sle11sp4_workarounds/16 ([debug] Failed to run dbus command 'set_vlan' with arguments 'tap11 6' : 'tap11' is not connected to bridge 'br1')

5) https://openqa.suse.de/tests/3071758#step/upgrade_from_sle11sp4_workarounds/16 ([debug] Failed to run dbus command 'set_vlan' with arguments 'tap11 3' : 'tap11' is not connected to bridge 'br1')

The last job ran parallel with https://openqa.suse.de/tests/3071756 which also ran in openqaworker7 and doesn't show the 'set_vlan' error in its autoinst.log: https://openqa.suse.de/tests/3071756/file/autoinst-log.txt

It seems that second job used tap3, which looks different when checked with 'ip a' in openqaworker7:

41: tap3: mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether e2:b9:5f:d0:95:1d brd ff:ff:ff:ff:ff:ff

vs.

11: tap11: mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
link/ether c2:5b:f9:9c:f4:2b brd ff:ff:ff:ff:ff:ff

There's a startup file for both interfaces:

openqaworker7:~ # ls -l /etc/sysconfig/network/ifcfg-tap{3,11}
-rw------- 1 root root 141 Jul 8 15:55 /etc/sysconfig/network/ifcfg-tap11
-rw------- 1 root root 141 Jul 8 15:55 /etc/sysconfig/network/ifcfg-tap3

But tap11 is not present in /etc/sysconfig/network/ifcfg-br1:

openqaworker7:~ # egrep 'tap3|tap11' /etc/sysconfig/network/ifcfg-br1
OVS_BRIDGE_PORT_DEVICE_3='tap3'

In short, it seems that the bridge configuration is non-existent for 30 out 60 tap interfaces:

openqaworker7:~ # grep OVS_BRIDGE_PORT_DEVICE_ /etc/sysconfig/network/ifcfg-br1 |wc -l
30
openqaworker7:~ # ls /etc/sysconfig/network/ifcfg-tap*|wc -l
60

Checking other workers, this is not the case:
openqaworker6:~ # grep OVS_BRIDGE_PORT_DEVICE_ /etc/sysconfig/network/ifcfg-br1 |wc -l
60
openqaworker6:~ # ls /etc/sysconfig/network/ifcfg-tap*|wc -l
60
openqaworker5:~ # grep OVS_BRIDGE_PORT_DEVICE_ /etc/sysconfig/network/ifcfg-br1 |wc -l
66
openqaworker5:~ # ls /etc/sysconfig/network/ifcfg-tap*|wc -l
66

Actions #3

Updated by acarvajal almost 5 years ago

I'm seeing in the salt pillars that 'numofworkers' for openqaworker7 was reduced from 20 to 10 as a consequence of poo#49694.

Since there are 60 interfaces but only half are configured for the bridge, I think this is related.

Actions #5

Updated by jlausuch over 4 years ago

Another occurrence with openqaworker7 : https://openqa.suse.de/tests/3125427#

[2019-07-25T00:27:03.793 CEST] [debug] Failed to run dbus command 'set_vlan' with arguments 'tap10 6' : 'tap10' is not connected to bridge 'br1'
[2019-07-25T00:27:03.834 CEST] [debug] Failed to run dbus command 'set_vlan' with arguments 'tap74 6' : 'tap74' is not connected to bridge 'br1'

Actions #6

Updated by nicksinger over 4 years ago

  • Is duplicate of action #54632: [tools] openqaworker7 is causing failures added
Actions #7

Updated by nicksinger over 4 years ago

  • Status changed from New to Rejected

I'm on it

Actions

Also available in: Atom PDF