Project

General

Profile

Actions

action #165782

open

[openQA][infra][ipxe][uefi][initrd] UEFI iPXE Machine fails to load initrd size:S

Added by waynechen55 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Support
Target version:
Start date:
2024-08-26
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

UEFI iPXE x86_64 machines, like kermit-1.qe.nue2.suse.org and scooter-1.qe.nue2.suse.org, fails to load initrd due to following error:

http://server_ip_address/ipxe/x86_64/pxeboot.SL-Micro.x86_64-6.1.kernel... ok
http://server_ip_address/ipxe/x86_64/pxeboot.SL-Micro.x86_64-6.1.initrd... 61%     ok
EFI stub: ERROR: Failed to open file: initrd
EFI stub: ERROR: Failed to load initrd: 0x800000000000000e
EFI stub: ERROR: efi_stub_entry() failed!
Could not boot: Error 0x7f04828e (http://ipxe.org/7f04828e)
Could not boot: Error 0x7f04828e (http://ipxe.org/7f04828e)

pxeboot.SL-Micro.x86_64-6.1.kernel and pxeboot.SL-Micro.x86_64-6.1.initrd come from SL Micro 6.1 Build12.2 pxe install tar https://download.suse.de/ibs/SUSE:/SLFO:/Products:/SL-Micro:/6.1:/ToTest/images/SL-Micro.x86_64-6.1-Default-SelfInstall-Build12.2.install.tar

Another x86_64 machine in Beijing can successfully loads kernel/initrd and finishes image deployment and firstboot configuration. After investigating a while, I found this might be related to outdated iPXE boot image in OSD network. This web page http://ipxe.org/7f04828e also states clearly as below:

Error: Could not start image

(Error code 7f04828e)
Possible sources

This error originated from one of the following locations within the iPXE source code:

    image/efi_image.c (line 278)

General advice

    Try using the latest version of iPXE. Your problem may have already been fixed.
    Try building iPXE with the debug option DEBUG=efi_image
    You can contact the iPXE developers and other iPXE users.
    Refresh this page after 24 hours. This page is actively monitored, and further information may be added soon.

The iPXE boot script was being used is as below:

kernel http://server_ip_address/ipxe/x86_64/pxeboot.SL-Micro.x86_64-6.1.kernel rd.debug rd.kiwi.term rd.kiwi.install.pass.bootparam rd.kiwi.oem.installdevice=/dev/sda rd.kiwi.debug rd.kiwi.install.pxe rd.kiwi.install.pxe.curl_options=--retry,3,--retry-delay,3,--speed-limit,2048 rd.kiwi.install.image=http://10.144.97.141:8666/ipxe/x86_64/SL-Micro.aarch64-6.1.xz  ignition.firstboot ignition.config.url=http://server_ip_address/config//ignition/config.ign.x86_64 combustion.firstboot combustion.url=http://server_ip_address/config//combustion/script.x86_64.ttyS1 root=/dev/ram0 initrd=initrd textmode=1  plymouth.enable=0  video=1024x768 vt.color=0x07  console=ttyS1,115200  Y2DEBUG=1 linuxrc.log=/dev/ttyS1 linuxrc.core=/dev/ttyS1 linuxrc.debug=4,trace  reboot_timeout=0 

initrd http://server_ip_address/ipxe/x86_64/pxeboot.SL-Micro.x86_64-6.1.initrd

boot

Steps to reproduce

  • Post iPXE boot script to iPXE http server
  • Set uefi machine on OSD to boot from iPXE
  • Reboot the machine
  • Machine loads kernel and initrd

Impact

Can not install SL Micro 6.1 via iPXE installation by manual or automatic installation.

Problem

Suspect outdated iPXE boot image on iPXE server.

Suggestions

  • Check iPXE boot image version on iPXE server
  • Update iPXE boot image to the latest version

Workaround

n/a


Related issues 2 (1 open1 closed)

Related to openQA Tests - action #166778: [tools][qe-core][qem] UEFI installation can not work any more due to qemu-ovmf package's update from worker sideResolvedokurz2024-09-13

Actions
Related to openQA Infrastructure - action #163529: redcurrant is unable to boot from PXE server (was: ppc64le pvm_hmc backend workers lost OS / grub access) size:SFeedbacknicksinger2024-07-09

Actions
Actions #2

Updated by waynechen55 3 months ago

  • Priority changed from Normal to High
Actions #3

Updated by tinita 3 months ago

  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #4

Updated by waynechen55 3 months ago

I found this

When booting a Linux kernel, iPXE will construct a “magic initrd” by injecting downloaded files into the initial RAM filesystem image. Any argument supplied to the initrd command will be used as the pathname for that image within the initrd.magic initial RAM filesystem.

in https://ipxe.org/cmd/imgfetch

So this really has something to do with iPXE boot image I think.

Actions #5

Updated by waynechen55 3 months ago

waynechen55 wrote in #note-4:

I found this

When booting a Linux kernel, iPXE will construct a “magic initrd” by injecting downloaded files into the initial RAM filesystem image. Any argument supplied to the initrd command will be used as the pathname for that image within the initrd.magic initial RAM filesystem.

in https://ipxe.org/cmd/imgfetch

So this really has something to do with iPXE boot image I think.

FYI @okurz @nicksinger @livdywan

Actions #6

Updated by livdywan 3 months ago

  • Tags set to infra
Actions #7

Updated by waynechen55 3 months ago · Edited

From IT guy:

host  kermit-1.qe.nue2.suse.org.
kermit-1.qe.nue2.suse.org has address 10.168.192.89
kermit-1.qe.nue2.suse.org has IPv6 address 2a07:de40:a102:5:ae1f:6bff:fe47:326

DHCP:

kermit-1:
  mac: 'ac:1f:6b:47:03:26'
  ip4: kermit-1.qe.nue2.suse.org
  dhcp_filename: 'kernelqa/ipxe.efi'

PXE server: 10.168.192.10 -> qa-jump.qe.nue2.suse.org.

qa-jump:~ #  l /srv/tftpboot/kernelqa/ipxe.efi
-rw-r--r-- 1 tftp tftp 945216 Mar 14  2023 /srv/tftpboot/kernelqa/ipxe.efi


host scooter-1.qe.nue2.suse.org.
scooter-1.qe.nue2.suse.org has address 10.168.192.87
scooter-1.qe.nue2.suse.org has IPv6 address 2a07:de40:a102:5:ae1f:6bff:fe47:7338

DHCP:
scooter-1:
  mac: 'ac:1f:6b:47:73:38'
  ip4: scooter-1.qe.nue2.suse.org
  dhcp_filename: 'kernelqa/ipxe.efi'


host squiddlydiddly.qe.nue2.suse.org
squiddlydiddly.qe.nue2.suse.org has address 10.168.193.18


  squiddlydiddly-1:
  mac: '7c:c2:55:85:f9:a4'
  ip4: squiddlydiddly.qe.nue2.suse.org
  hostname: squiddlydiddly
  dhcp_filename: 'kernelqa/ipxe_aarch64.efi'

  qa-jump:~ # l /srv/tftpboot/kernelqa/ipxe_aarch64.efi 
-rw-r--r-- 1 tftp tftp 912032 Mar 14  2023 /srv/tftpboot/kernelqa/ipxe_aarch64.efi

That means all hosts have the same PXE server 10.168.192.10
kermit-1.qe.nue2.suse.org. and scooter-1.qe.nue2.suse.org. boot the file kernelqa/ipxe.efi

The sever squiddlydiddly.qe.nue2.suse.org boot the file kernelqa/ipxe_aarch64.efi

both files ipxe_aarch64.efi and ipxe.efi are NOT symlink to any RPM it just a binary files.
I did not found any records how they ware copy to the server - that the source of them.
and also its not salt because the server qa-jump.qe.nue2.suse.org is not our official PXE server.

I think its something what was done by QA itself in the past.

@okurz @nicksinger @livdywan or kernel qa guy can help ???

Actions #8

Updated by waynechen55 3 months ago

I think it would be better to take a small step forward to start solve the problem:

  1. Based on current situation, there is no record of when and who copied this ipxe.efi, let alone the source of current ipxe.efi. So who should be responsible for ipxe.efi maintenance ? And what could be the source of ipxe.efi on our iPXE server ?

  2. IT suggests

We (as Infra team) should managed this by our salt I would prefer to be an RPM in an official SUSE repo, or at least a project inside IBS / build.suse.de. Take a random code from web, compiled it and run does not sound good to me. Somebody should review the code and confirm that does not contains  back-door or any other spyware.

So is this feasible or can be improved ?

@okurz @nicksinger @livdywan @kernelqa

Actions #9

Updated by waynechen55 3 months ago · Edited

Hi Kernel QA,

@MMoese @pcervinka

Tools team think kernel will be helpful on next step to help solve the issue.

  • The original issue is

UEFI iPXE x86_64 machines, like kermit-1.qe.nue2.suse.org and scooter-1.qe.nue2.suse.org, and aarch64 machine, like squiddlydiddly, fail to load SL Micro 6.1 initrd due to following error:

http://server_ip_address/ipxe/x86_64/pxeboot.SL-Micro.x86_64-6.1.kernel... ok
http://server_ip_address/ipxe/x86_64/pxeboot.SL-Micro.x86_64-6.1.initrd... 61%     ok
EFI stub: ERROR: Failed to open file: initrd
EFI stub: ERROR: Failed to load initrd: 0x800000000000000e
EFI stub: ERROR: efi_stub_entry() failed!
Could not boot: Error 0x7f04828e (http://ipxe.org/7f04828e)
Could not boot: Error 0x7f04828e (http://ipxe.org/7f04828e)

pxeboot.SL-Micro.x86_64-6.1.kernel and pxeboot.SL-Micro.x86_64-6.1.initrd come from SL Micro 6.1 Build12.2 pxe install tar https://download.suse.de/ibs/SUSE:/SLFO:/Products:/SL-Micro:/6.1:/ToTest/images/SL-Micro.x86_64-6.1-Default-SelfInstall-Build12.2.install.tar

But all UEFI iPXE x86_64 and aarch64 machines in Beijing lab can boot successfully with the same kernel and initird.

  • I found
When booting a Linux kernel, iPXE will construct a “magic initrd” by injecting downloaded files into the initial RAM filesystem image. Any argument supplied to the initrd command will be used as the pathname for that image within the initrd.magic initial RAM filesystem.

in https://ipxe.org/cmd/imgfetch. So this really has something to do with iPXE boot image I think.

  • According what IT guys found, ipxe binary is not maintained by them and it seems that it is done by QA in the past. It turns out to be Kernel QA judging by
host  kermit-1.qe.nue2.suse.org.
kermit-1.qe.nue2.suse.org has address 10.168.192.89
kermit-1.qe.nue2.suse.org has IPv6 address 2a07:de40:a102:5:ae1f:6bff:fe47:326

DHCP:

kermit-1:
  mac: 'ac:1f:6b:47:03:26'
  ip4: kermit-1.qe.nue2.suse.org
  dhcp_filename: 'kernelqa/ipxe.efi'

PXE server: 10.168.192.10 -> qa-jump.qe.nue2.suse.org.

qa-jump:~ #  l /srv/tftpboot/kernelqa/ipxe.efi
-rw-r--r-- 1 tftp tftp 945216 Mar 14  2023 /srv/tftpboot/kernelqa/ipxe.efi


host scooter-1.qe.nue2.suse.org.
scooter-1.qe.nue2.suse.org has address 10.168.192.87
scooter-1.qe.nue2.suse.org has IPv6 address 2a07:de40:a102:5:ae1f:6bff:fe47:7338

DHCP:
scooter-1:
  mac: 'ac:1f:6b:47:73:38'
  ip4: scooter-1.qe.nue2.suse.org
  dhcp_filename: 'kernelqa/ipxe.efi'


host squiddlydiddly.qe.nue2.suse.org
squiddlydiddly.qe.nue2.suse.org has address 10.168.193.18


  squiddlydiddly-1:
  mac: '7c:c2:55:85:f9:a4'
  ip4: squiddlydiddly.qe.nue2.suse.org
  hostname: squiddlydiddly
  dhcp_filename: 'kernelqa/ipxe_aarch64.efi'

  qa-jump:~ # l /srv/tftpboot/kernelqa/ipxe_aarch64.efi 
-rw-r--r-- 1 tftp tftp 912032 Mar 14  2023 /srv/tftpboot/kernelqa/ipxe_aarch64.efi
Actions #10

Updated by MMoese 3 months ago

I think you have your bootscript incorrectly configured for these machines. I would rather not replace the file we have for our machines with something different - as this might also break our machines that work perfectly well.
Rather I would suggest you build and maintain your own iPXE file, they can safely reside somewhere else and you could update the dhcp configs to point to this file. Building such a binary is really easy, and our confluence pages have everything you need for this. But I'm willing to help you if you have issues with that, just ping me on Slack then. But please don't overwrite whatever files we have in kernelqa/ subdirectory.

Actions #11

Updated by waynechen55 3 months ago

MMoese wrote in #note-10:

I think you have your bootscript incorrectly configured for these machines. I would rather not replace the file we have for our machines with something different - as this might also break our machines that work perfectly well.
Rather I would suggest you build and maintain your own iPXE file, they can safely reside somewhere else and you could update the dhcp configs to point to this file. Building such a binary is really easy, and our confluence pages have everything you need for this. But I'm willing to help you if you have issues with that, just ping me on Slack then. But please don't overwrite whatever files we have in kernelqa/ subdirectory.

I would rather say I opened this ticket after many tries with different configurations, but nothing works with these machines on OSD. On the contrary, all machines in Beijing can easily and successfully boot. And ipxe binary image really has something to do with kernel/initrd loading. So I think:

  1. If you think bootscript for these machines are not correctly configured, could you help let me know which setting is incorrect, which setting is missing or which setting should be removed ?
  2. If you do not know either, I can continue to explore. If, at the last, updating ipxe binary is needed, then it has to be done.
Actions #12

Updated by livdywan 2 months ago

  • Subject changed from [openQA][infra][ipxe][uefi][initrd] UEFI iPXE Machine fails to load initrd to [openQA][infra][ipxe][uefi][initrd] UEFI iPXE Machine fails to load initrd size:S
  • Category changed from Regressions/Crashes to Support
Actions #13

Updated by okurz 2 months ago

  • Priority changed from High to Normal
Actions #14

Updated by waynechen55 2 months ago · Edited

@MMoese

I think I solved the problem to some extent, but I think:

  1. Current ipxe boot image may not handle SL-Micro 6.1 pxe-capable intird downloading well. So no one knows what the downloaded initrd will be converted to. I tried many possibilities, but nothing works.

  2. I solved the problem to some extent by using:

kernel .......... initrd="TO BE USED INITRD NAME"..........
initrd --name "TO BE USED INITRD NAME" INITRD_URL

Current ipxe_install.pm module does not pass --name option to initrd command. I just used "initrd" as the name. It looks like workaround ???

But I did not give it a full installation/configuration due to downloading issue from host outside.

Actions #15

Updated by MMoese 2 months ago

  1. What do you mean with conversion? The image just gets downloaded to RAM?
  2. Feel free to add a PR to add the name to the initrd in ipxe_install.pm

Not sure if this is a workaround or actually needed like this - but in doubt better have a workaround than non-working machines :)

Actions #16

Updated by waynechen55 2 months ago

MMoese wrote in #note-15:

  1. What do you mean with conversion? The image just gets downloaded to RAM?
  2. Feel free to add a PR to add the name to the initrd in ipxe_install.pm

Not sure if this is a workaround or actually needed like this - but in doubt better have a workaround than non-working machines :)

I think current ipxe image does not convert the downloaded initrd image into a well-known file, for example, "initrd", "initrd.magic" or others. This might be caused by this ipex image does not treat file pxeboot.SL-Micro.x86_64-6.1.initrd the same as file initrd.

Actions #17

Updated by okurz 2 months ago

  • Target version changed from Ready to future

@MMoese @waynechen55 I trust that you can figure out the problems on your own :) So removing from the tools team backlog.

Actions #18

Updated by okurz 2 months ago

  • Related to action #166778: [tools][qe-core][qem] UEFI installation can not work any more due to qemu-ovmf package's update from worker side added
Actions #19

Updated by okurz 2 months ago

  • Related to action #163529: redcurrant is unable to boot from PXE server (was: ppc64le pvm_hmc backend workers lost OS / grub access) size:S added
Actions

Also available in: Atom PDF