Project

General

Profile

Actions

action #114977

closed

kernel-rt server has no network access size:S

Added by pcervinka over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
QE Kernel - QE Kernel Done
Start date:
2022-08-04
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP4-Server-Full-RT-x86_64-prepare_baremetal@ipmi-kernel-rt fails in
ipxe_install

Test suite description

Maintainer: pcervinka. Proceed with installation on bare metal machines.

Reproducible

Fails since (at least) Build 5.31 (current job)

Expected result

Last good: 5.30 (or more recent)

Further details

Always latest result in this scenario: latest

  • only ipmi connection works
  • server can't boot from the network via PXE
  • server has no ip address from dhcp when booted from local disk
susetest:~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b0:3a:f2:b6:05:9f brd ff:ff:ff:ff:ff:ff
    altname enp0s20f0u11u2c2
3: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 3c:ec:ef:5d:76:7c brd ff:ff:ff:ff:ff:ff
    altname eno1
    altname enp1s0f0
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 3c:ec:ef:5d:76:7d brd ff:ff:ff:ff:ff:ff
    altname eno2
    altname enp1s0f1
Actions #1

Updated by pcervinka over 1 year ago

  • Assignee deleted (pcervinka)
Actions #2

Updated by pcervinka over 1 year ago

  • Assignee set to nicksinger

nick confirmed that kernel-rt was moved yesterday:
I will check the switch and let you know. Maybe we connected a broken cable. This could be fixed by us tomorrow when we're in the office again

Actions #3

Updated by nicksinger over 1 year ago

  • Tags changed from baremetal, ipmi to baremetal, ipmi, next-office-day
  • Assignee changed from nicksinger to okurz

@okurz could you please check the connection if you're in the office today? Given the output from the switch and ethtool I suspect a loose cable on the server end. Best would be to completely exchange the cable and throw away the old one if it is somehow broken. Thanks!

Actions #4

Updated by okurz over 1 year ago

  • Due date set to 2022-08-12
  • Status changed from New to Feedback
  • Target version set to Ready

I checked the cable and it looked fine, both switch as well as server end claim there is a link. I tried a different cable with different switch port, same behavior. I connected a second cable to second Ethernet port now. eth2 is connected to switch port GE22 on qanet03nue, first is connected to GE07 on qanet03nue. pcervinka confirms that there is an IP address received on eth2 now and that this is the expected behavior.

Actions #5

Updated by livdywan over 1 year ago

  • Subject changed from kernel-rt server has no network access to kernel-rt server has no network access size:S
Actions #6

Updated by pcervinka over 1 year ago

  • Status changed from Feedback to Workable
  • Assignee changed from okurz to pcervinka
  • Priority changed from High to Normal
  • Target version changed from Ready to 640

@okurz thanks for connecting 2nd network card and check on site. Status is now, that server has eth2 up and is possible to connect, but installation is broken:

https://openqa.suse.de/tests/9260895/logfile?filename=hardware-console-log.txt

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: 3C-EC-EF-5D-76-7C.
  PXE-E16: No valid offer received.

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: 3C-EC-EF-5D-76-7C.
  PXE-E16: No valid offer received.

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: 3C-EC-EF-5D-76-7D.
  Station IP address is 10.162.3.83

  Server IP address is 10.162.0.1
  NBP filename is kernelqa/ipxe-kernel-rt.efi
  NBP filesize is 945216 Bytes
>>Checking Media Presence......
>>Media Present......
 Downloading NBP file...

  NBP file downloaded successfully.
iPXE initialising devices...ok

iPXE 1.20.1+ (gbdf0e) -- Open Source Network Boot Firmware -- http://ipxe.org class="ansi-white-fg ansi-black-bg">
Features: DNS HTTP iSCSI TFTP SRP AoE EFI Menu
Chainloading the bootscript now
Configuring (net0 3c:ec:ef:5d:76:7c)...... ok
http://baremetal-support.qa.suse.de:8080/script.ipxe... No such file or directory (http://ipxe.org/2d0c618e)
Welcome to GRUB!

We can see:

  • pxe will not boot from 1st network card (not configured mac address) - this is expected
  • pxe boots from 2nd network card (mac is matched) - this is expected
  • but ipxe request goes via first 1st card and it messes ipxe support service - first card in previous location didn't work, because of cabling issues

1st network card didn't work before so we configured server to use 2nd card, which worked fine, till now. It looks that 1st card works now and we should try to use it.

Actions #8

Updated by pcervinka over 1 year ago

Above PR is just preparation, it will be WIP, till i figure out why eth1 with mac 3c:ec:ef:5d:76:7c is down. It is strange that 3c:ec:ef:5d:76:7c can be found in dhcp server logs during system boot:

qanet:/var/log # tail -f messages | grep -i  3c:ec:ef:5d:76:7
2022-08-04T15:02:40.240337+02:00 qanet dhcpd: DHCPDISCOVER from 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:02:41.241515+02:00 qanet dhcpd: DHCPOFFER on 10.162.30.35 to 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:02:43.605743+02:00 qanet dhcpd: DHCPDISCOVER from 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:02:43.606271+02:00 qanet dhcpd: DHCPOFFER on 10.162.30.35 to 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:02:51.624771+02:00 qanet dhcpd: DHCPDISCOVER from 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:02:51.625291+02:00 qanet dhcpd: DHCPOFFER on 10.162.30.35 to 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:03:07.608347+02:00 qanet dhcpd: DHCPDISCOVER from 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:03:07.608890+02:00 qanet dhcpd: DHCPOFFER on 10.162.30.35 to 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:03:40.777821+02:00 qanet dhcpd: DHCPDISCOVER from 3c:ec:ef:5d:76:7d via sif0
2022-08-04T15:03:40.778516+02:00 qanet dhcpd: DHCPOFFER on 10.162.3.83 to 3c:ec:ef:5d:76:7d via sif0
2022-08-04T15:03:44.353449+02:00 qanet dhcpd: DHCPREQUEST for 10.162.3.83 (10.162.0.1) from 3c:ec:ef:5d:76:7d via sif0
2022-08-04T15:03:44.354122+02:00 qanet dhcpd: DHCPACK on 10.162.3.83 to 3c:ec:ef:5d:76:7d via sif0
2022-08-04T15:04:18.374682+02:00 qanet dhcpd: DHCPDISCOVER from 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:04:19.375284+02:00 qanet dhcpd: DHCPOFFER on 10.162.32.138 to 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:04:20.107379+02:00 qanet dhcpd: DHCPDISCOVER from 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:04:21.107966+02:00 qanet dhcpd: DHCPOFFER on 10.162.32.138 to 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:04:23.622815+02:00 qanet dhcpd: DHCPREQUEST for 10.162.32.138 (10.162.0.1) from 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:04:23.623290+02:00 qanet dhcpd: DHCPACK on 10.162.32.138 to 3c:ec:ef:5d:76:7c via sif0
2022-08-04T15:05:51.703795+02:00 qanet dhcpd: uid lease 10.162.32.173 for client 3c:ec:ef:5d:76:7d is duplicate on qa.suse.de
2022-08-04T15:05:51.704592+02:00 qanet dhcpd: DHCPREQUEST for 10.162.3.83 from 3c:ec:ef:5d:76:7d via sif0
2022-08-04T15:05:51.705064+02:00 qanet dhcpd: DHCPACK on 10.162.3.83 to 3c:ec:ef:5d:76:7d via sif0
Actions #9

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prepare_baremetal@ipmi-kernel-rt
https://openqa.suse.de/tests/9323161#step/ipxe_install/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #10

Updated by slo-gin over 1 year ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #11

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prepare_baremetal@ipmi-kernel-rt
https://openqa.suse.de/tests/9428194#step/ipxe_install/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 40 days if nothing changes in this ticket.

Actions #12

Updated by slo-gin over 1 year ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #13

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prepare_baremetal@ipmi-kernel-rt
https://openqa.suse.de/tests/9428194#step/ipxe_install/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #14

Updated by slo-gin over 1 year ago

This ticket is 10 days after the due-date. Please consider closing this ticket or move the due-date accordingly.

Actions #15

Updated by pcervinka over 1 year ago

  • Tags changed from baremetal, ipmi, next-office-day to baremetal, ipmi
  • Due date deleted (2022-08-12)

Agreed with @MMoese, that he will disconnect cable next week.

Actions #16

Updated by okurz over 1 year ago

  • Status changed from Workable to Feedback

dheidler was in the lab and disconnected the cable to the 1st interface. So all good now?

Actions #17

Updated by pcervinka over 1 year ago

  • Status changed from Feedback to Resolved
  • Target version changed from 640 to QE Kernel Done
Actions #18

Updated by openqa_review over 1 year ago

  • Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prepare_baremetal@ipmi-kernel-rt
https://openqa.suse.de/tests/9428194#step/ipxe_install/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #19

Updated by pcervinka over 1 year ago

  • Status changed from Feedback to Resolved
Actions #20

Updated by openqa_review over 1 year ago

  • Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prepare_baremetal@ipmi-kernel-rt
https://openqa.suse.de/tests/9428194#step/ipxe_install/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 76 days if nothing changes in this ticket.

Actions #21

Updated by pcervinka about 1 year ago

  • Status changed from Feedback to Resolved
Actions #22

Updated by openqa_review about 1 year ago

  • Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: prepare_baremetal@ipmi-kernel-rt
https://openqa.suse.de/tests/9428194#step/ipxe_install/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 72 days if nothing changes in this ticket.

Actions #23

Updated by pcervinka about 1 year ago

@okurz why is this ticket reopened, just because of tag on 6 month old job?

Actions #24

Updated by pcervinka about 1 year ago

  • Status changed from Feedback to Resolved
Actions #25

Updated by okurz 10 months ago

pcervinka wrote:

@okurz why is this ticket reopened, just because of tag on 6 month old job?

Because openqa-review can not decide which age is "too old" respectively for the according products. This is why it's unconditional. The three suggestions in the comment still hold. If you think none of them apply to the case of this ticket here or the referenced openQA job I would be happy to hear your suggestions or discuss further options.

Actions #26

Updated by pcervinka 10 months ago

okurz wrote:

Because openqa-review can not decide which age is "too old" respectively for the according products. This is why it's unconditional. The three suggestions in the comment still hold. If you think none of them apply to the case of this ticket here or the referenced openQA job I would be happy to hear your suggestions or discuss further options.

Thank you, I understand now.

Actions

Also available in: Atom PDF