action #174448
openbare-metal5 and bare-metal6 fail to boot from PXE most times
0%
Description
Brought up in #174352#note-8
Worker slot in question: https://openqa.suse.de/admin/workers/3992
https://openqa.suse.de/tests/16190397 is a good example.
[37m[2024-12-13T10:57:41.004199Z] [debug] [pid:38779] setting iPXE bootscript on http://baremetal-support.qe.prg2.suse.org for 10.146.4.107 to:
#!ipxe
echo ++++++++++++++++++++++++++++++++++++++++++
echo ++++++++++++ openQA ipxe boot ++++++++++++
echo + Host: bare-metal5.qe.prg2.suse.org
echo ++++++++++++++++++++++++++++++++++++++++++
kernel http://openqa.suse.de/assets/repo/fixed/SLE-15-SP6-Online-x86_64-GM-Media1/boot/x86_64/loader/linux install=http://openqa.suse.de/assets/repo/fixed/SLE-15-SP6-Online-x86_64-GM-Media1 root=/dev/ram0 initrd=initrd textmode=1 autoyast=http://worker36.oqa.prg2.suse.org:20623/PnzpD17aUM9vS2E7/files/bare-metal5.qe.prg2.suse.orgvirt_autotest/host_unattended_installation_files/autoyast/dev_host_15.xml sshd=1 sshpassword=nots3cr3t plymouth.enable=0 video=1024x768 vt.color=0x07 console=ttyS1,115200 Y2DEBUG=1 linuxrc.log=/dev/ttyS1 linuxrc.core=/dev/ttyS1 linuxrc.debug=4,trace reboot_timeout=0
initrd http://openqa.suse.de/assets/repo/fixed/SLE-15-SP6-Online-x86_64-GM-Media1/boot/x86_64/loader/initrd
boot
[0m
[37m[2024-12-13T10:57:41.008929Z] [debug] [pid:38779] 200 OK
[0m
[37m[2024-12-13T10:57:41.009014Z] [debug] [pid:38779] setting boot device to pxe[0m
[37m[2024-12-13T10:57:41.068997Z] [debug] [pid:38779] IPMI: Set Boot Device to pxe[0m
[37m[2024-12-13T10:57:44.131260Z] [debug] [pid:38779] IPMI: Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: a004000000
Boot Flags :
- Boot Flag Valid
- Options apply to only next boot
- BIOS EFI boot
- Boot Device Selector : Force PXE
- BIOS verbosity : System Default
- Console Redirection control : Console redirection occurs per BIOS configuration setting (default)
- BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST[0m
[37m[2024-12-13T10:57:44.189398Z] [debug] [pid:38779] IPMI: Chassis Power Control: Up/On[0m
[37m[2024-12-13T10:57:47.248980Z] [debug] [pid:38779] IPMI: Chassis Power is off[0m
[37m[2024-12-13T10:57:47.309820Z] [debug] [pid:38779] IPMI: Chassis Power Control: Up/On[0m
[37m[2024-12-13T10:57:50.368759Z] [debug] [pid:38779] IPMI: Chassis Power is off[0m
[37m[2024-12-13T10:57:50.461045Z] [debug] [pid:38779] IPMI: Chassis Power Control: Up/On[0m
[37m[2024-12-13T10:57:53.516036Z] [debug] [pid:38779] IPMI: Chassis Power is on[0m
Frame by frame analysis of the video doesn't indicate any attempt to perform a PXE boot.
Files
Updated by dheidler 4 days ago
- Related to action #174352: 2 ipmi backend baremetal machines in OSD worker pool are offline size:S added
Updated by dheidler 4 days ago
- File frame0114.png frame0114.png added
- File frame0115.png frame0115.png added
- File frame0148.png frame0148.png added
- Description updated (diff)
Updated by dheidler 4 days ago · Edited
- Status changed from New to In Progress
- Assignee set to dheidler
- Took bare-metal5 out of production
- PXE or bios bootdev selection via ipmitool for next boot are NOT followed
(at least using
chassis bootdev pxe
chassis power off
chassis power on
despitechassis bootparam get 5
showing otherwise) - disabled quiet boot via bios setup
- still no change
- deleted sles boot entry and put network boot to first in order
- reenabled
- let's see https://openqa.suse.de/tests/16203423#live
Updated by xlai 3 days ago
Julie_CAO wrote in #note-5:
Hi @xlai , is there a USB stick on this machine? Or do you know any change has been made to this machine recently? they used to run tests well but begun to fail to boot now.
No, the usbs are on bare-metal{1,2}. And I do not see changes on the two machines from what I know.
Updated by Julie_CAO 3 days ago
dheidler wrote in #note-6:
- Took bare-metal5 out of production
- PXE or bios bootdev selection via ipmitool for next boot are NOT followed (at least using
chassis bootdev pxe
chassis power off
chassis power on
despitechassis bootparam get 5
showing otherwise)- disabled quiet boot via bios setup
- still no change
- deleted sles boot entry and put network boot to first in order
- reenabled
- let's see https://openqa.suse.de/tests/16203423#live
Thanks for these actions, but it still failed to boot in https://openqa.suse.de/tests/16203423
Updated by openqa_review 3 days ago
- Due date set to 2024-12-31
Setting due date based on mean cycle time of SUSE QE Tools
Updated by dheidler 3 days ago
redfishtool -r bare-metal5-ipmi.qe.prg2.suse.org -u ***** -p ***** Systems get | jq .Boot
{
"BootSourceOverrideEnabled": "Once",
"BootSourceOverrideMode": "UEFI",
"BootSourceOverrideTarget": "Pxe",
"BootSourceOverrideTarget@Redfish.AllowableValues": [
"None",
"Pxe",
"Floppy",
"Cd",
"Usb",
"Hdd",
"BiosSetup",
"UsbCd",
"UefiBootNext",
"UefiHttp"
],
"BootOptions": {
"@odata.id": "/redfish/v1/Systems/1/BootOptions"
},
"BootNext": null,
"BootOrder": [
"Boot0000",
"Boot0003",
"Boot0004",
"Boot0002"
]
}
Redshift shows similar output as IPMITOOL does.
Updated by dheidler about 23 hours ago
- Status changed from In Progress to Workable
- Assignee deleted (
dheidler)
I'm out of ideas here.
Updated by xlai about 22 hours ago
Has it been tried if selecting pxe boot from bios works? Besides, I may suggest to check boot options in bios, allowing only pxe boot and disk boot.
@xlai shall we pull them out of the OSD worker pool until the ticket is resolved?
Yes, please help to.
Updated by MMoese about 21 hours ago
Some machines really don't want to have ipmi controlled boot devices.
With kernel baremetal hardware, we solved this with a workaround. We have set those of our machines to always boot from PXE first and their first NVMe as second boot device. They always boot, get their bootscript fron the baremetal support service. When we don't want to install them, we just use https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/7c85092bafb2a6ca9f64d91b871544790da683ce/tests/installation/ipxe_install.pm#L126 and the machine boots from disk. I've mostly observed this for UEFI machines and don't remember to encounter it for legacy boot.
Not sure if this helps you.
Updated by xguo about 19 hours ago · Edited
Julie_CAO wrote in #note-13:
@xguo Do you have ideas?
No ideas.
Try to add IPMI_BACKEND_MC_RESET=1. but, confirm that does not work very well with bare-metal5 either. bare-metal5 is still unstable now.
FYI.
Refer to https://openqa.suse.de/admin/workers/3992 for more details.