action #109494
closedopenQA Project (public) - coordination #101048: [epic] Investigate and fix higher instability of openqaworker-arm-4/5 vs. arm-1/2/3
Restore network connection of arm-4/5 size:M
0%
Description
Observation¶
After starting arm-4/5 today (to investigate #109232) both workers were unable to get an IP address.
I couldn't find messages from them in /var/log/messages
on qanet.qa.suse.de
. I also couldn't ping it via arping
:
openqaworker-arm-4:~ # arping -I eth0 -b 10.162.0.1
ARPING 10.162.0.1 from 10.0.2.2 eth0
There are no failed systemd units, including wicked
. The network config looks like this:
openqaworker-arm-4:~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 18:c0:4d:8c:82:8e brd ff:ff:ff:ff:ff:ff
altname eno1
altname enp11s0f0
inet6 fe80::1ac0:4dff:fe8c:828e/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 18:c0:4d:8c:82:8f brd ff:ff:ff:ff:ff:ff
altname eno2
altname enp11s0f1
18: ovs-system: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether ae:1e:c1:a3:0f:39 brd ff:ff:ff:ff:ff:ff
19: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 4e:40:d5:d2:bf:43 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.2/15 brd 10.1.255.255 scope global br1
valid_lft forever preferred_lft forever
inet6 fe80::4c40:d5ff:fed2:bf43/64 scope link
valid_lft forever preferred_lft forever
20: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 72:c6:8a:8a:03:d2 brd ff:ff:ff:ff:ff:ff
21: tap64: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether f2:41:3b:31:bb:29 brd ff:ff:ff:ff:ff:ff
22: tap128: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether f6:fc:23:fa:80:c3 brd ff:ff:ff:ff:ff:ff
23: tap1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 62:ee:a0:ff:3e:38 brd ff:ff:ff:ff:ff:ff
24: tap65: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 4a:4e:4c:01:9c:79 brd ff:ff:ff:ff:ff:ff
25: tap129: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 2a:f6:47:e0:e8:88 brd ff:ff:ff:ff:ff:ff
26: tap2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 2e:6c:54:d2:6c:fc brd ff:ff:ff:ff:ff:ff
27: tap66: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether f6:c8:0f:c7:de:7a brd ff:ff:ff:ff:ff:ff
28: tap130: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 8e:e0:72:c5:d3:4f brd ff:ff:ff:ff:ff:ff
29: tap3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether ee:32:a5:78:2b:eb brd ff:ff:ff:ff:ff:ff
30: tap67: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 8a:d3:a0:2e:f4:f0 brd ff:ff:ff:ff:ff:ff
31: tap131: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 7e:4e:54:78:eb:14 brd ff:ff:ff:ff:ff:ff
So eth0
is up. It looks exactly alike on arm-5.
Updated by mkittler over 2 years ago
- Blocks action #109232: Document relevant differences of arm-4/5 vs. arm-1/2/3 and aarch64.o.o, involve domain experts in asking what parameters are important to be able to run openQA tests size:M added
Updated by mkittler over 2 years ago
- Project changed from openQA Project (public) to openQA Infrastructure (public)
Updated by okurz over 2 years ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by mkittler over 2 years ago
- Status changed from New to In Progress
- Assignee set to mkittler
Since we don't know the switches those workers are connected to I'll file an Infra ticket.
Updated by mkittler over 2 years ago
- Status changed from In Progress to Feedback
Updated by mkittler over 2 years ago
That's the ip addr
output on arm-5 (for the mac address):
openqaworker-arm-5:~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 18:c0:4d:06:ce:57 brd ff:ff:ff:ff:ff:ff
altname eno1
altname enp11s0f0
inet6 fe80::1ac0:4dff:fe06:ce57/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 18:c0:4d:06:ce:58 brd ff:ff:ff:ff:ff:ff
altname eno2
altname enp11s0f1
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether ee:2e:6a:fe:6a:e1 brd ff:ff:ff:ff:ff:ff
5: tap64: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 86:8b:d4:d6:2c:dd brd ff:ff:ff:ff:ff:ff
6: tap128: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 36:7a:b4:24:ae:a8 brd ff:ff:ff:ff:ff:ff
7: tap1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 26:b5:b3:1b:cb:c5 brd ff:ff:ff:ff:ff:ff
8: tap65: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 96:bf:fa:d1:e4:cd brd ff:ff:ff:ff:ff:ff
9: tap129: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 22:b2:6c:31:6f:89 brd ff:ff:ff:ff:ff:ff
10: tap2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 9e:18:69:25:1b:58 brd ff:ff:ff:ff:ff:ff
11: tap66: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 36:be:02:ab:5c:4c brd ff:ff:ff:ff:ff:ff
12: tap130: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether d6:44:54:0b:76:1e brd ff:ff:ff:ff:ff:ff
13: tap3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 3a:d6:af:60:c6:84 brd ff:ff:ff:ff:ff:ff
14: tap67: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether 7e:b7:6e:fd:47:24 brd ff:ff:ff:ff:ff:ff
15: tap131: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
link/ether ae:ee:00:92:c9:cc brd ff:ff:ff:ff:ff:ff
16: ovs-system: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether ce:5c:e0:f6:e4:56 brd ff:ff:ff:ff:ff:ff
17: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether ea:c8:48:36:27:48 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.2/15 brd 10.1.255.255 scope global br1
valid_lft forever preferred_lft forever
inet6 fe80::e8c8:48ff:fe36:2748/64 scope link
valid_lft forever preferred_lft forever
Updated by nicksinger over 2 years ago
- Assignee changed from mkittler to nicksinger
I've asked for help from lmb@suse.de to help us to reconfigure the switches. Until we have proper network on them I will take the ticket for now.
Updated by nicksinger over 2 years ago
From "Flurfunk" I heard that lmb seems to be out of office currently. This is why I decided to move these machines into a different rack to a switch controlled by us, the new location is Rack 4 right next to our "QA Racks": https://racktables.suse.de/index.php?page=rack&rack_id=522. I've also added cable connections to https://racktables.suse.de/index.php?page=object&tab=ports&object_id=11969 (host OS ethernet uplink) and https://racktables.suse.de/index.php?page=object&tab=ports&object_id=996 (BMC connections). qanet20nue was reconfigured to provide the BMC with untagged VLAN12 to the BMCs and they both can be reached under their documented ip/hostname. openqaworker-arm-4 was able to receive an IP from qanet again and is now reachable normally via ssh. Unfortunately the fiber connection for arm-5 is broken. The switch reports "RX Loose" and no IP assignment is possible. This cable needs to be replaced again.
Updated by mkittler over 2 years ago
- Blocks deleted (action #109232: Document relevant differences of arm-4/5 vs. arm-1/2/3 and aarch64.o.o, involve domain experts in asking what parameters are important to be able to run openQA tests size:M)
Updated by okurz over 2 years ago
- Subject changed from Restore network connection of arm-4/5 to Restore network connection of arm-4/5 size:M
nicksinger plans for the physical work to be conducted over the course of the next days.
Updated by nicksinger over 2 years ago
- Status changed from Feedback to Resolved
I switched the fiber cables. Now arm-5 is also reachable again:
selenium ~ ยป ping openqaworker-arm-5
PING openqaworker-arm-5.qa.suse.de (10.162.6.203) 56(84) bytes of data.
64 Bytes von openqaworker-arm-5.qa.suse.de (10.162.6.203): icmp_seq=1 ttl=64 Zeit=0.219 ms
64 Bytes von openqaworker-arm-5.qa.suse.de (10.162.6.203): icmp_seq=2 ttl=64 Zeit=0.198 ms