Project

General

Profile

Actions

action #137564

closed

test fails sporadically in Autoyast iSCSI - cannot boot from ROM, '/usr/share/qemu/ipxe.lkrn' file may need update

Added by tinawang123 about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2023-10-07
Due date:
% Done:

0%

Estimated time:

Description

Motivation
Related failed job: https://openqa.suse.de/tests/12413982#step/bootloader_start/3
All the failed jobs related with iSCSI
During qemu command set: -boot once=d
But it cannot boot from DVD

Actions #1

Updated by rfan1 about 1 year ago

Some findings from my side:
Based on the openQA result, the passed jobs on worker2 with which qemu 6.2.x is used. however for the failed jobs on worker30, qemu 7.1.x is used.

I am not sure if the new qemu version makes some difference. I will try to investigate more and ask for some help from qe-tools and qe-virtualization team

Actions #2

Updated by okurz about 1 year ago

Let's follow up with that hypothesis. We don't have many machines left on Leap 15.4 but still some. So I triggered

for i in {001..010}; do for class in sapworker1 openqaworker14 worker30 worker40; do openqa-clone-job --within-instance openqa.suse.de/t12413982 INCLUDE_MODULES=prepare_profile,bootloader_start _GROUP=0 BUILD=poo137564 TEST+=-poo137564-$class-$i WORKER_CLASS=$class; done; done

https://openqa.suse.de/tests/overview?version=15-SP6&distri=sle&build=poo137564

openqaworker14 still has Leap 15.4 and regarding version of qemu:
sudo salt --no-color -L 'sapworker1.qe.nue2.suse.org,openqaworker14.qa.suse.cz,worker30.oqa.prg2.suse.org,worker40.oqa.prg2.suse.org' cmd.run 'rpm -q qemu'

shows

worker40.oqa.prg2.suse.org:                                                                                                                           
    qemu-7.1.0-150500.49.6.1.x86_64                                                                                                                   
worker30.oqa.prg2.suse.org:                                                                                                                           
    qemu-7.1.0-150500.49.6.1.x86_64                                                                                                                   
sapworker1.qe.nue2.suse.org:                                                                                                                          
    qemu-7.1.0-150500.49.6.1.x86_64                                                                                                                   
openqaworker14.qa.suse.cz:                                                                                                                            
    qemu-6.2.0-150400.37.20.1.x86_64               

https://openqa.suse.de/tests/overview?version=15-SP6&distri=sle&build=poo137564 shows clearly that in 30/30 runs on qemu-7.1.0 booting from CDROM failed. Also the logs show an error, e.g. see https://openqa.suse.de/tests/12417758/logfile?filename=autoinst-log.txt

[2023-10-07T13:19:15.557786+02:00] [debug] [pid:95172] QEMU: KVM internal error. Suberror: 1
[2023-10-07T13:19:15.557879+02:00] [debug] [pid:95172] QEMU: emulation failure
[2023-10-07T13:19:15.557930+02:00] [debug] [pid:95172] QEMU: EAX=00000000 EBX=00000000 ECX=00001000 EDX=00001000
[2023-10-07T13:19:15.557983+02:00] [debug] [pid:95172] QEMU: ESI=00011020 EDI=000903ab EBP=00009cbc ESP=00002fbe
[2023-10-07T13:19:15.558028+02:00] [debug] [pid:95172] QEMU: EIP=00000392 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
[2023-10-07T13:19:15.558072+02:00] [debug] [pid:95172] QEMU: ES =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA]
[2023-10-07T13:19:15.558110+02:00] [debug] [pid:95172] QEMU: CS =0008 00010200 ffffffff 00809b00 DPL=0 CS16 [-RA]
[2023-10-07T13:19:15.558169+02:00] [debug] [pid:95172] QEMU: SS =9cbc 0009cbc0 0000ffff 00009300 DPL=0 DS16 [-WA]
[2023-10-07T13:19:15.558233+02:00] [debug] [pid:95172] QEMU: DS =1020 00010200 0000ffff 00009300 DPL=0 DS16 [-WA]
[2023-10-07T13:19:15.558290+02:00] [debug] [pid:95172] QEMU: FS =1000 00010000 0000ffff 00009300 DPL=0 DS16 [-WA]
[2023-10-07T13:19:15.558330+02:00] [debug] [pid:95172] QEMU: GS =1000 00010000 0000ffff 00009300 DPL=0 DS16 [-WA]
[2023-10-07T13:19:15.558378+02:00] [debug] [pid:95172] QEMU: LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
[2023-10-07T13:19:15.558422+02:00] [debug] [pid:95172] QEMU: TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
[2023-10-07T13:19:15.558464+02:00] [debug] [pid:95172] QEMU: GDT=     0009fb84 0000001f
[2023-10-07T13:19:15.558509+02:00] [debug] [pid:95172] QEMU: IDT=     00000000 000003ff
[2023-10-07T13:19:15.558554+02:00] [debug] [pid:95172] QEMU: CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
[2023-10-07T13:19:15.558591+02:00] [debug] [pid:95172] QEMU: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
[2023-10-07T13:19:15.558636+02:00] [debug] [pid:95172] QEMU: DR6=00000000ffff0ff0 DR7=0000000000000400
[2023-10-07T13:19:15.558675+02:00] [debug] [pid:95172] QEMU: EFER=0000000000000000
[2023-10-07T13:19:15.558713+02:00] [debug] [pid:95172] QEMU: Code=6e 74 69 6e 75 65 0a 00 0a 50 61 79 6c 6f 61 64 20 69 6e 61 <63> 63 65 73 73 69 62 6c 65 20 2d 20 63 61 
[2023-10-07T13:19:15.558855+02:00] [debug] [pid:95172] QEMU: 6e 6e 6f 74 20 63 6f 6e 74 69 6e 75 65 0a 00 67

The successful jobs like https://openqa.suse.de/tests/12417727/logfile?filename=autoinst-log.txt do not show this. in 10/10 jobs on qemu-6 on https://openqa.suse.de/tests/overview?version=15-SP6&distri=sle&build=poo137564 no problem is visible.

Other successful jobs on Leap 15.5 workers with qemu do not show this problem, e.g. see https://openqa.suse.de/tests/12409619/logfile?filename=autoinst-log.txt . I suggest you look further in the direction of this qemu/kvm error why the referenced test scenario seems to trigger this problem but not other qemu-x86_64 jobs.

Actions #3

Updated by rfan1 about 1 year ago

  • Assignee set to rfan1
Actions #4

Updated by leli about 1 year ago

  • Status changed from New to Workable
  • Target version set to Current
Actions #5

Updated by rfan1 about 1 year ago

  • Status changed from Workable to In Progress

Thanks much @okurz for the kindly update!

After investigating more, [the system seems hang at ipxe rom and fails to process to boot from cdtom] I can find some clue now. for this iscsi installation test, we use ipxe.lkrn file to start the kernel boot and try to hook the iscsi lun and then install the system, the current ipxe.lkrn file might be out of date and needs to be updated:

 -kernel /usr/share/qemu/ipxe.lkrn -append dhcp && sanhook iscsi:xx.xx.xx.xx::3260:1:iqn.2016-02.openqa.de:for.openqa

I tried to rebuild the /usr/share/qemu/ipxe.lkrn file followed the steps mentioned in https://ipxe.org/download and replaced the one in worker30

# git clone https://github.com/ipxe/ipxe.git
# cd ipxe/src
# make

the case can pass now with the new ipxe.lkrn file: https://openqa.suse.de/tests/12417826

-rw-r--r-- 1 root root  339007 Oct  7 16:04 ipxe.lkrn [new one]
-rw-r--r-- 1 root root  310511 Oct  7 16:08 ipxe.lkrn.orig [old one]

However, I don't think I am following the right process to update/replace the iplx.lkrn file. IMO this file should be maintained by salt or some other openQA process.

Can tools teams share some light on this?

In summary, after updating the ipxe.lkrn file, the issue is gone on worker30 with qemuversion=7.1.x

Actions #6

Updated by rfan1 about 1 year ago

  • Subject changed from test fails sporadically in Autoyast iSCSI - cannot boot from ROM to test fails sporadically in Autoyast iSCSI - cannot boot from ROM, 'ipxe.lkrn' file may need update
Actions #7

Updated by rfan1 about 1 year ago

  • Subject changed from test fails sporadically in Autoyast iSCSI - cannot boot from ROM, 'ipxe.lkrn' file may need update to test fails sporadically in Autoyast iSCSI - cannot boot from ROM, '/usr/share/qemu/ipxe.lkrn' file may need update
Actions #8

Updated by rfan1 about 1 year ago

I can find the pkg 'ipxe-bootimgs' has build the binary file, let me try with it.

Actions #9

Updated by rfan1 about 1 year ago

Well, the test passed with the binary file from pkg ipxe-bootimgs
https://openqa.suse.de/tests/12455790

Let me try to post PR/MR later to use this new one

Actions #12

Updated by rfan1 about 1 year ago

  • Status changed from In Progress to Feedback

PR/MR are both merged, let me check next openQA run, maybe I should have to wait next openQA deployment to make sure all changes take effect!

Actions #13

Updated by rfan1 about 1 year ago

  • Status changed from Feedback to Resolved

With kindly help from tools team, the issue is fixed now.

Actions

Also available in: Atom PDF