Project

General

Profile

Actions

action #129601

open

bootloader_uefi fails too often in aarch64

Added by jlausuch over 1 year ago. Updated 7 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2023-05-19
Due date:
% Done:

0%

Estimated time:
Difficulty:
Tags:

Description

installation/bootloader_uefi.pm times out quite often because openQA does not have time to enter uefi grub2 configuration screen before the 10s grub timeout.

e.g.
https://openqa.suse.de/tests/11157624#step/bootloader_uefi/11
https://openqa.suse.de/tests/11157620#step/bootloader_uefi/11
https://openqa.suse.de/tests/11157622#step/bootloader_uefi/11

The culprit is that the first bootloader-grub2 takes too long to match.

This is a successful run:

[2023-05-17T22:29:48.135632+02:00] [debug] [pid:78975] <<< testapi::assert_screen(mustmatch=[
    "bootloader-shim-import-prompt",
    "bootloader-grub2"
  ], timeout=90)
[2023-05-17T22:29:49.928958+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.82 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:29:49.929428+02:00] [debug] [pid:79369] no match: 89.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:29:50.104691+02:00] [debug] [pid:79369] no change: 88.0s
[2023-05-17T22:29:50.588948+02:00] [debug] [pid:79369] no match: 88.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:29:51.105278+02:00] [debug] [pid:79369] no change: 87.0s
[2023-05-17T22:29:51.589146+02:00] [debug] [pid:79369] no match: 87.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:29:52.105972+02:00] [debug] [pid:79369] no change: 86.0s
[2023-05-17T22:29:52.589983+02:00] [debug] [pid:79369] no match: 86.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:00.232487+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.13 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:00.232953+02:00] [debug] [pid:79369] no match: 85.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:00.248237+02:00] [debug] [pid:79369] no change: 77.8s
[2023-05-17T22:30:00.733227+02:00] [debug] [pid:79369] no match: 77.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:01.236512+02:00] [debug] [pid:79369] no change: 76.9s
[2023-05-17T22:30:01.743323+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.51 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:01.743772+02:00] [debug] [pid:79369] no match: 76.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:09.594737+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.36 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:09.595227+02:00] [debug] [pid:79369] no match: 75.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:10.332028+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.71 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:10.332520+02:00] [debug] [pid:79369] no match: 68.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:11.328004+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.72 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:11.328472+02:00] [debug] [pid:79369] no match: 67.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:12.205373+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.59 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:12.205867+02:00] [debug] [pid:79369] no match: 66.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:19.704649+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.09 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:19.705116+02:00] [debug] [pid:79369] no match: 65.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:20.417002+02:00] [debug] [pid:78975] >>> testapi::_handle_found_needle: found bootmenu-jeos-20201204, similarity 1.00 @ 101/208
[2023-05-17T22:30:20.419694+02:00] [debug] [pid:78975] tests/installation/bootloader_uefi.pm:124 called bootloader_setup::uefi_bootmenu_params -> lib/bootloader_setup.pm:383 called testapi::send_key_until_needlematch
[2023-05-17T22:30:20.420160+02:00] [debug] [pid:78975] <<< testapi::check_screen(mustmatch="grub2-enter-edit-mode", timeout=0)

And this is a failed run:

[2023-05-18T22:31:23.112731+02:00] [debug] [pid:20579] <<< testapi::assert_screen(mustmatch=[
    "bootloader-shim-import-prompt",
    "bootloader-grub2"
  ], timeout=90)
[2023-05-18T22:31:24.904170+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.82 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:24.904637+02:00] [debug] [pid:21382] no match: 89.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:25.081871+02:00] [debug] [pid:21382] no change: 88.0s
[2023-05-18T22:31:25.580143+02:00] [debug] [pid:21382] no match: 88.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:26.094924+02:00] [debug] [pid:21382] no change: 87.0s
[2023-05-18T22:31:26.569793+02:00] [debug] [pid:21382] no match: 87.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:27.083393+02:00] [debug] [pid:21382] no change: 86.0s
[2023-05-18T22:31:27.553981+02:00] [debug] [pid:21382] no match: 86.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:35.276260+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.19 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:35.276731+02:00] [debug] [pid:21382] no match: 85.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:36.083159+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.78 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:36.083646+02:00] [debug] [pid:21382] no match: 77.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:36.279755+02:00] [debug] [pid:21382] no change: 76.8s
[2023-05-18T22:31:37.020384+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:37.020854+02:00] [debug] [pid:21382] no match: 76.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:44.628366+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.35 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:44.628834+02:00] [debug] [pid:21382] no match: 75.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:45.401250+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:45.401731+02:00] [debug] [pid:21382] no match: 68.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:46.386500+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:46.386994+02:00] [debug] [pid:21382] no match: 67.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:47.387270+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:47.387869+02:00] [debug] [pid:21382] no match: 66.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:55.130059+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.48 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:55.130529+02:00] [debug] [pid:21382] no match: 65.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:55.715449+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.56 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:55.715931+02:00] [debug] [pid:21382] no match: 57.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:56.143795+02:00] [debug] [pid:21382] no change: 56.9s
[2023-05-18T22:31:56.669176+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.52 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:56.669652+02:00] [debug] [pid:21382] no match: 56.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:32:04.606465+02:00] [debug] [pid:20579] >>> testapi::_handle_found_needle: found bootmenu-jeos-20201204, similarity 1.00 @ 101/208
[2023-05-18T22:32:04.610142+02:00] [debug] [pid:20579] tests/installation/bootloader_uefi.pm:124 called bootloader_setup::uefi_bootmenu_params -> lib/bootloader_setup.pm:383 called testapi::send_key_until_needlematch
[2023-05-18T22:32:04.610684+02:00] [debug] [pid:20579] <<< testapi::check_screen(mustmatch="grub2-enter-edit-mode", timeout=0)

So, in the first case, the needle takes around 25s to be matched after the VM is started, but in the second case it takes ~45s, which makes the grub to pick the default option and boot the system without a chance to go to the grub2-enter-edit-mode screen.

Actions #1

Updated by jlausuch over 1 year ago

  • Description updated (diff)
Actions #2

Updated by jlausuch over 1 year ago

I see 2 options:
1) Send a random key in uefi_bootmenu_params before send_key_until_needlematch('grub2-enter-edit-mode', 'e', 6, 0.5) to stop the timeout

2) Clean-up some needles. Would be good to do it anyways, but this doesn't not guarantee 100% that it will happen again.

Actions #5

Updated by jlausuch over 1 year ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by jlausuch over 1 year ago

So, I have observed that the failure rate is much lower, but still some failures from time to time:
https://openqa.suse.de/tests/11191129#

Actions #11

Updated by okurz over 1 year ago

  • Category set to Bugs in existing tests
Actions #13

Updated by jlausuch about 1 year ago

  • Assignee deleted (jlausuch)
Actions #14

Updated by dzedro about 1 year ago

I would use this as a workaround on uefi/aarch64 with small grub2 timeout.
Increase grub2 timeout after the qcow image is converted. https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L848
virt-edit image.qcow2 /boot/grub2/grub.cfg -e 's/timeout=10/timeout=60/'
Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.

Actions #15

Updated by szarate about 1 year ago

dzedro wrote:

I would use this as a workaround on uefi/aarch64 with small grub2 timeout.
Increase grub2 timeout after the qcow image is converted. https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L848
virt-edit image.qcow2 /boot/grub2/grub.cfg -e 's/timeout=10/timeout=60/'
Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.

in general I'd vouch for setting that thing to infinite... either way we can go with the workaround for now and take the time to fix it the right way, same as we did with SLES...
As in: Guarantee that SUT will have an infinite timeout

in general I think extracting host_config, might be the way to go, that way we avoid the issue of multiple kiwi files, or multiple workarounds instead of a proper solution

See discussion: https://suse.slack.com/archives/C02CANHLANP/p1692183993494469

Actions #16

Updated by jlausuch about 1 year ago

dzedro wrote in #note-14:

I would use this as a workaround on uefi/aarch64 with small grub2 timeout.
Increase grub2 timeout after the qcow image is converted. https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L848
virt-edit image.qcow2 /boot/grub2/grub.cfg -e 's/timeout=10/timeout=60/'

+1000! That would give us some rest with restarting jobs, even if it's a workaround, it's a nice one.

Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.

This would be ideal but not possible in all cases, I remember some tests are expecting this screen on purpose to do some actions (I don't remember now of any concrete examples). I would prefer that the test owner has the choice of matching the grub screen or skip it (with KEEP_GRUB_TIMEOUT var).

Actions #17

Updated by jlausuch about 1 year ago

Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.

This job is an example of why we need to capture the grub screen so that we can modify some parameters.

Actions #19

Updated by okurz about 1 year ago

Installation tests should completely disable the grub timeout. Maybe that is lost from the schedule in some cases? For ready-created images the suggestion about changing in the qcow file sounds good

Actions #20

Updated by jlausuch about 1 year ago

okurz wrote in #note-19:

Installation tests should completely disable the grub timeout. Maybe that is lost from the schedule in some cases?
+1 SLE Micro jobs do that already.

For ready-created images the suggestion about changing in the qcow file sounds good
Where do you think this could fit better? os-autoinst? or in the bootloader modules?

Actions #21

Updated by okurz about 1 year ago

jlausuch wrote in #note-20:

For ready-created images the suggestion about changing in the qcow file sounds good
Where do you think this could fit better? os-autoinst? or in the bootloader modules?

Either bootloader module or separate preparation module

Actions #22

Updated by jlausuch about 1 year ago

okurz wrote in #note-21:

jlausuch wrote in #note-20:

For ready-created images the suggestion about changing in the qcow file sounds good
Where do you think this could fit better? os-autoinst? or in the bootloader modules?

Either bootloader module or separate preparation module

Ok, thanks for your input. I might come with some proposal.

Actions #23

Updated by jlausuch 7 months ago

  • Assignee set to jlausuch
Actions

Also available in: Atom PDF