Project

General

Profile

Actions

action #150938

closed

[openQA][sut][ipmi] No ipmi sol output with ix64ph1075 size:M

Added by waynechen55 7 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
-
Target version:
Start date:
2023-11-16
Due date:
% Done:

0%

Estimated time:

Description

Observation

Test run starts failing with imagetester:7 at ipxe_install, for example, https://openqa.suse.de/tests/12822901#step/ipxe_install/1. It looks like needle matching failure, but actually there is nothing printed out on its ipmi sol console after reboot.

ipmitool -I lanplus -C 3 -H ix64ph1075-sp.qe.nue2.suse.org -U admin -P xxxxxxxx sol activate

Steps to reproduce

  • Connect to ix64ph1075 ipmi sol console
  • Reboot the machine
  • Wait for output on ipmi sol console

Impact

No test run assigned to imagetester:7 can proceed. Now imagetester:6

Problem

  • Looks like something wrong with ipmi sol console

Suggestions

  • Check ipmi sol config
  • Check warning/error in BMC
  • Factory-reset the BMC
  • Reinstall the firmware
  • Click every possible button
  • Check that the physical ethernet cable is not broken

Workaround

n/a

Rollback actions

  • sudo systemctl unmask openqa-worker-auto-restart@6 && sudo systemctl enable --now openqa-worker-auto-restart@6

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #155659: [openQA][infra][sut] Failed to establish connnection to ix64ph1075-sp.qe.nue2.suse.orgResolvedokurz2024-02-20

Actions
Actions #1

Updated by okurz 7 months ago

  • Tags set to infra, ipmi, ix64ph1075
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version set to Ready
Actions #2

Updated by okurz 7 months ago

  • Description updated (diff)

masked the worker instance and called mc reset, now power reset and sol activate again.

Actions #3

Updated by okurz 7 months ago

  • Tags changed from infra, ipmi, ix64ph1075 to infra, ipmi, ix64ph1075, next-frankencampus-visit

Using

ipmi… mc reset cold
ipmi… power off
ipmi… power on
ipmi… sol activate

no output after sufficient waiting time. We should physically check the machine.

Actions #4

Updated by livdywan 7 months ago

  • Subject changed from [openQA][sut][ipmi] No ipmi sol output with ix64ph1075 to [openQA][sut][ipmi] No ipmi sol output with ix64ph1075 size:M
  • Description updated (diff)
Actions #5

Updated by okurz 7 months ago

  • Status changed from In Progress to Blocked

Over http://ix64ph1075-sp.qe.nue2.suse.org/ I could not find out the specifics of the hardware and I can't check physically due to defective FC basement access control, tracking that in #150830 so blocking on that

Actions #6

Updated by waynechen55 7 months ago

If you find any other sut machines misbehave, please also update here. @xlai @Julie_CAO @nanzhang @rcai @xguo

Actions #7

Updated by okurz 7 months ago

  • Description updated (diff)
Actions #9

Updated by okurz 7 months ago

I checked the machine physically. I pulled out all cables and reconnected them again. Checking machine over IPMI again. Still no output I can now see output over SoL again so that original problem was resolved. Regarding potential firmware upgrade I took pictures to find out what machine that could be.

From stickers on the hardware I could find SA2260A308R, OCC erdmann/25-FEB-13 transtec C82500B47M20062 P/N C8E-825 Factory Code: ABC-02

Possibly it's https://www.supermicro.com/en/products/chassis/2U/825/SC825TQC-R1K03LPB with motherboard https://www.supermicro.com/en/products/motherboard/X11DPi-N

Trying BMC firmware upgrade although I doubt https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X11DPI-N/BIOS is providing the correct versions that would be needed.

Actions #10

Updated by okurz 7 months ago

  • Tags changed from infra, ipmi, ix64ph1075, next-frankencampus-visit to infra, ipmi, ix64ph1075
  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

The firmware upgrade attempt never finished for me. I double-checked IPMI and can power off/on the machine but could not see anything on SoL. Can someone else please try to give this a go?

Actions #11

Updated by nicksinger 7 months ago

unfortunately I had also no success with the IPMIView java application. Just a black screen for that host

Actions #12

Updated by okurz 6 months ago

  • Target version changed from Ready to Tools - Next
Actions #13

Updated by okurz 5 months ago

  • Target version changed from Tools - Next to Ready
Actions #14

Updated by okurz 5 months ago

  • Priority changed from Normal to Low
Actions #15

Updated by nicksinger 5 months ago

  • Assignee set to nicksinger
Actions #16

Updated by nicksinger 5 months ago

  • Status changed from Workable to In Progress

In the Java IPMIViewer application I was able to find the Board Model under the "IPMI Device"-Tab. It states "X9DR3/i-F" which corresponds to https://www.supermicro.com/products/archive/motherboard/x9dr3-f - unfortunately the official firmware download link does timeout for me and others as well. The used IPMI platform seems to be called "X9" so I tried to just search "X9_" on https://www.supermicro.com/support/resources/bios_ipmi.php?vendor=1 and downloaded the newest one I could find ("3.64") which I was able to flash and overwrite the present version 3.62. This had no effect and did not improve our situation. Maybe we had a wrong version all along and I try to find some archived version (even if older) to have a "known good".

Actions #17

Updated by nicksinger 5 months ago

flashing 3.16 completely broke the webinterface now. After spending way to much time researching how to flash the firmware otherwise I found a tool included in the download of the firmware itself. Currently running to see if I can recover the web interface

Actions #18

Updated by nicksinger 5 months ago

I had multiple fails yesterday with the provided lUpdate tool from supermicro. Today I tried it from a host within the same network (monitor) and it worked flawlessly to at least recover to the newest version again

Actions #19

Updated by okurz 5 months ago

As discussed please contact supermicro support and ask if they can provide help for no output on IPMI SoL. You already tried reflashing the firmware and applying the latest publically available update. They might have more recent version as well.

Actions #20

Updated by nicksinger 5 months ago

  • Status changed from In Progress to Resolved

I found a super helpful reddit post (https://www.reddit.com/r/homelab/comments/j2l7w2/supermicro_x9drif_with_quad_nvme_success/) linking to some different resources from supermicro. On it I found an "official" download for the latest firmware of our platform: https://www.supermicro.com/en/support/resources/downloadcenter/MBD-X9DR3-F. I installed Firmware version 3.62 via the webinterface. Afterwards I installed the newest BIOS (also linked on that homepage) which allowed me to reset all BIOS-settings in the process of doing so. Once the BIOS update was complete the system still didn't output anything so I cut the power completely by using the PDU and left it powered off for ~30 minutes to give every component a chance to initialize completely. Lo and behold; after "plugging" the system back into power and booting it, I received a graphical output on the web-interface, the IPMIViewer utility and also via serial-over-lan. I shut down the system now and it can be powered back on via the usual means.

Actions #21

Updated by okurz 5 months ago

that's awesome! Just missed the rollback and verification steps. I have conducted the rollback step and tried to lookup an openQA job to verify but couldn't find any. So let's resolve without but with the worker slot again enabled, see https://openqa.suse.de/admin/workers/3495

Actions #22

Updated by okurz 4 months ago

  • Related to action #155659: [openQA][infra][sut] Failed to establish connnection to ix64ph1075-sp.qe.nue2.suse.org added
Actions

Also available in: Atom PDF