Project

General

Profile

action #112553

openQA Project - coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

action #105594: Two new machines for OSD and o3, meant for bare-metal virtualization size:M

[osd][amd][zen3][network][sriov] New AMD Zen3 machine on OSD lost its nework connection with p3p1 interface

Added by waynechen55 about 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2022-06-16
Due date:
% Done:

0%

Estimated time:

Description

Observation

The second link to new zen3 machine on OSD is down. The interface p3p1 is completely down.

According to dhcpd.conf:
host amd-zen3-gpu-sut1-2 { hardware ethernet b4:96:91:9c:5a:d4; fixed-address 10.162.2.133; option host-name "amd-zen3-gpu-sut1-2"; filename "pxelinux.0"; }

But ip addr show:
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: em1: mtu 1500 qdisc mq master br0 state UP group default qlen 1000
link/ether ec:2a:72:02:84:20 brd ff:ff:ff:ff:ff:ff
altname eno8303
altname enp225s0f0
3: p3p1: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b4:96:91:9c:5a:d4 brd ff:ff:ff:ff:ff:ff
altname enp65s0f0
4: em2: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ec:2a:72:02:84:21 brd ff:ff:ff:ff:ff:ff
altname eno8403
altname enp225s0f1
5: p3p2: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b4:96:91:9c:5a:d5 brd ff:ff:ff:ff:ff:ff
altname enp65s0f1
6: br0: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ec:2a:72:02:84:20 brd ff:ff:ff:ff:ff:ff
inet 10.162.2.132/18 brd 10.162.63.255 scope global br0
valid_lft forever preferred_lft forever
inet6 2620:113:80c0:80a0:10:162:31:5733/64 scope global dynamic noprefixroute
valid_lft 2331023sec preferred_lft 1359023sec
inet6 fe80::ee2a:72ff:fe02:8420/64 scope link
valid_lft forever preferred_lft forever

Steps to reproduce

  • ssh to amd-zen3-gpu-sut1-1.qa.suse.de
  • Collect network interface status

Problem

Down p3p1 could affect success rate of SR-IOV test

Suggestion

  • Make p3p1 interface up and running and make sure ip address configured

Workaround

Not applicable

History

#1 Updated by waynechen55 about 2 months ago

I just found the config file for the interface had been changed somehow to 'dot not boot automatically' I guess.

I changed it back to
BOOTPROTO='dhcp'
STARTMODE='auto'

Now p3p1 is up and running.

#2 Updated by waynechen55 about 2 months ago

  • Status changed from New to Resolved

Closing as resolved.

#3 Updated by okurz about 2 months ago

We just crosschecked https://racktables.nue.suse.com/index.php?page=object&tab=ports&object_id=16390 and found that the documentation covers this properly. Two interfaces are connected. For the next time like the tool ethtool can be of help to find out if there is a physical connection detected.

Also available in: Atom PDF