action #150938
closed[openQA][sut][ipmi] No ipmi sol output with ix64ph1075 size:M
0%
Description
Observation¶
Test run starts failing with imagetester:7
at ipxe_install, for example, https://openqa.suse.de/tests/12822901#step/ipxe_install/1. It looks like needle matching failure, but actually there is nothing printed out on its ipmi sol console after reboot.
ipmitool -I lanplus -C 3 -H ix64ph1075-sp.qe.nue2.suse.org -U admin -P xxxxxxxx sol activate
Steps to reproduce¶
- Connect to ix64ph1075 ipmi sol console
- Reboot the machine
- Wait for output on ipmi sol console
Impact¶
No test run assigned to imagetester:7
can proceed. Now imagetester:6
Problem¶
- Looks like something wrong with ipmi sol console
Suggestions¶
- Check ipmi sol config
- Check warning/error in BMC
- Factory-reset the BMC
- Reinstall the firmware
- Click every possible button
- Check that the physical ethernet cable is not broken
Workaround¶
n/a
Rollback actions¶
sudo systemctl unmask openqa-worker-auto-restart@6 && sudo systemctl enable --now openqa-worker-auto-restart@6
Updated by okurz about 1 year ago
- Tags set to infra, ipmi, ix64ph1075
- Status changed from New to In Progress
- Assignee set to okurz
- Target version set to Ready
Updated by okurz about 1 year ago
- Description updated (diff)
masked the worker instance and called mc reset
, now power reset
and sol activate
again.
Updated by okurz about 1 year ago
- Tags changed from infra, ipmi, ix64ph1075 to infra, ipmi, ix64ph1075, next-frankencampus-visit
Using
ipmi… mc reset cold
ipmi… power off
ipmi… power on
ipmi… sol activate
no output after sufficient waiting time. We should physically check the machine.
Updated by livdywan about 1 year ago
- Subject changed from [openQA][sut][ipmi] No ipmi sol output with ix64ph1075 to [openQA][sut][ipmi] No ipmi sol output with ix64ph1075 size:M
- Description updated (diff)
Updated by okurz about 1 year ago
- Status changed from In Progress to Blocked
Over http://ix64ph1075-sp.qe.nue2.suse.org/ I could not find out the specifics of the hardware and I can't check physically due to defective FC basement access control, tracking that in #150830 so blocking on that
Updated by waynechen55 about 1 year ago
If you find any other sut machines misbehave, please also update here. @xlai @Julie_CAO @nanzhang @rcai @xguo
Updated by okurz about 1 year ago
I checked the machine physically. I pulled out all cables and reconnected them again. Checking machine over IPMI again. Still no output I can now see output over SoL again so that original problem was resolved. Regarding potential firmware upgrade I took pictures to find out what machine that could be.
From stickers on the hardware I could find SA2260A308R, OCC erdmann/25-FEB-13 transtec C82500B47M20062 P/N C8E-825 Factory Code: ABC-02
Possibly it's https://www.supermicro.com/en/products/chassis/2U/825/SC825TQC-R1K03LPB with motherboard https://www.supermicro.com/en/products/motherboard/X11DPi-N
Trying BMC firmware upgrade although I doubt https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X11DPI-N/BIOS is providing the correct versions that would be needed.
Updated by okurz about 1 year ago
- Tags changed from infra, ipmi, ix64ph1075, next-frankencampus-visit to infra, ipmi, ix64ph1075
- Status changed from Blocked to Workable
- Assignee deleted (
okurz)
The firmware upgrade attempt never finished for me. I double-checked IPMI and can power off/on the machine but could not see anything on SoL. Can someone else please try to give this a go?
Updated by nicksinger about 1 year ago
unfortunately I had also no success with the IPMIView java application. Just a black screen for that host
Updated by nicksinger 10 months ago
- Status changed from Workable to In Progress
In the Java IPMIViewer application I was able to find the Board Model under the "IPMI Device"-Tab. It states "X9DR3/i-F" which corresponds to https://www.supermicro.com/products/archive/motherboard/x9dr3-f - unfortunately the official firmware download link does timeout for me and others as well. The used IPMI platform seems to be called "X9" so I tried to just search "X9_" on https://www.supermicro.com/support/resources/bios_ipmi.php?vendor=1 and downloaded the newest one I could find ("3.64") which I was able to flash and overwrite the present version 3.62. This had no effect and did not improve our situation. Maybe we had a wrong version all along and I try to find some archived version (even if older) to have a "known good".
Updated by nicksinger 10 months ago
flashing 3.16 completely broke the webinterface now. After spending way to much time researching how to flash the firmware otherwise I found a tool included in the download of the firmware itself. Currently running to see if I can recover the web interface
Updated by nicksinger 10 months ago
I had multiple fails yesterday with the provided lUpdate tool from supermicro. Today I tried it from a host within the same network (monitor) and it worked flawlessly to at least recover to the newest version again
Updated by nicksinger 10 months ago
- Status changed from In Progress to Resolved
I found a super helpful reddit post (https://www.reddit.com/r/homelab/comments/j2l7w2/supermicro_x9drif_with_quad_nvme_success/) linking to some different resources from supermicro. On it I found an "official" download for the latest firmware of our platform: https://www.supermicro.com/en/support/resources/downloadcenter/MBD-X9DR3-F. I installed Firmware version 3.62 via the webinterface. Afterwards I installed the newest BIOS (also linked on that homepage) which allowed me to reset all BIOS-settings in the process of doing so. Once the BIOS update was complete the system still didn't output anything so I cut the power completely by using the PDU and left it powered off for ~30 minutes to give every component a chance to initialize completely. Lo and behold; after "plugging" the system back into power and booting it, I received a graphical output on the web-interface, the IPMIViewer utility and also via serial-over-lan. I shut down the system now and it can be powered back on via the usual means.
Updated by okurz 10 months ago
that's awesome! Just missed the rollback and verification steps. I have conducted the rollback step and tried to lookup an openQA job to verify but couldn't find any. So let's resolve without but with the worker slot again enabled, see https://openqa.suse.de/admin/workers/3495
Updated by okurz 10 months ago
- Related to action #155659: [openQA][infra][sut] Failed to establish connnection to ix64ph1075-sp.qe.nue2.suse.org added