Project

General

Profile

Actions

action #96260

closed

coordination #96185: [epic] Multimachine failure rate increased

Failed to add GRE tunnel to openqaworker10 on most OSD workers, recent regression explaining multi-machine errors? size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-07-29
Due date:
% Done:

0%

Estimated time:

Description

Observation

From OSD:

sudo salt -l error --state-output=changes -C 'G@roles:worker' cmd.run 'ovs-vsctl show | grep -C 3 error'                     
openqaworker2.suse.de:
openqaworker8.suse.de:
                Interface gre6
                    type: gre
                    options: {remote_ip="10.160.2.20"}
                    error: "could not add network device gre6 to ofproto (File exists)"
            Port gre7
                Interface gre7
                    type: gre
                    options: {remote_ip="10.160.2.20"}
                    error: "could not add network device gre7 to ofproto (File exists)"
            Port tap75
                Interface tap75
            Port tap14
openqaworker3.suse.de:
… same error
openqaworker9.suse.de:
… same error
openqaworker6.suse.de:
… same error
openqaworker5.suse.de:
… same error
QA-Power8-5-kvm.qa.suse.de:
… same error
QA-Power8-4-kvm.qa.suse.de:
malbec.arch.suse.de:
… same error
powerqaworker-qam-1.qa.suse.de:
grenache-1.qa.suse.de:
openqaworker13.suse.de:
openqaworker10.suse.de:
                Interface gre6
                    type: gre
                    options: {remote_ip="10.160.2.20"}
                    error: "could not add network device gre6 to ofproto (File exists)"
            Port tap8
                Interface tap8
            Port tap128
openqaworker-arm-1.suse.de:
                Interface gre6
                    type: gre
                    options: {remote_ip="10.160.2.20"}
                    error: "could not add network device gre6 to ofproto (File exists)"
            Port tap132
                Interface tap132
            Port gre9
    --
                Interface gre5
                    type: gre
                    options: {remote_ip="10.160.2.20"}
                    error: "could not add network device gre5 to ofproto (File exists)"
            Port tap130
                Interface tap130
            Port gre4
openqaworker-arm-2.suse.de:
… same error
openqaworker-arm-3.suse.de:
… same error
ERROR: Minions returned with non-zero exit code

so the same error appears in most workers, but not all. The IPv4 address is 10.160.2.20, which is openqaworker10
the job is running on openqaworker10 itself. So it looks like we fail to add a GRE tunnel to the same host?

Suggestion


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #96938: openqaworker10+13 are offline, reason unknown, let's fix other problems first size:MResolvedmkittler2021-08-16

Actions
Actions

Also available in: Atom PDF