action #129601
openbootloader_uefi fails too often in aarch64
0%
Description
installation/bootloader_uefi.pm times out quite often because openQA does not have time to enter uefi grub2 configuration screen before the 10s grub timeout.
e.g.
https://openqa.suse.de/tests/11157624#step/bootloader_uefi/11
https://openqa.suse.de/tests/11157620#step/bootloader_uefi/11
https://openqa.suse.de/tests/11157622#step/bootloader_uefi/11
The culprit is that the first bootloader-grub2
takes too long to match.
This is a successful run:
[2023-05-17T22:29:48.135632+02:00] [debug] [pid:78975] <<< testapi::assert_screen(mustmatch=[
"bootloader-shim-import-prompt",
"bootloader-grub2"
], timeout=90)
[2023-05-17T22:29:49.928958+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.82 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:29:49.929428+02:00] [debug] [pid:79369] no match: 89.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:29:50.104691+02:00] [debug] [pid:79369] no change: 88.0s
[2023-05-17T22:29:50.588948+02:00] [debug] [pid:79369] no match: 88.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:29:51.105278+02:00] [debug] [pid:79369] no change: 87.0s
[2023-05-17T22:29:51.589146+02:00] [debug] [pid:79369] no match: 87.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:29:52.105972+02:00] [debug] [pid:79369] no change: 86.0s
[2023-05-17T22:29:52.589983+02:00] [debug] [pid:79369] no match: 86.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:00.232487+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.13 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:00.232953+02:00] [debug] [pid:79369] no match: 85.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:00.248237+02:00] [debug] [pid:79369] no change: 77.8s
[2023-05-17T22:30:00.733227+02:00] [debug] [pid:79369] no match: 77.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:01.236512+02:00] [debug] [pid:79369] no change: 76.9s
[2023-05-17T22:30:01.743323+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.51 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:01.743772+02:00] [debug] [pid:79369] no match: 76.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:09.594737+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.36 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:09.595227+02:00] [debug] [pid:79369] no match: 75.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:10.332028+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.71 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:10.332520+02:00] [debug] [pid:79369] no match: 68.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:11.328004+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.72 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:11.328472+02:00] [debug] [pid:79369] no match: 67.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:12.205373+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.59 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:12.205867+02:00] [debug] [pid:79369] no match: 66.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:19.704649+02:00] [warn] [pid:79369] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.09 seconds for 14 candidate needles - make your needles more specific
[2023-05-17T22:30:19.705116+02:00] [debug] [pid:79369] no match: 65.5s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-17T22:30:20.417002+02:00] [debug] [pid:78975] >>> testapi::_handle_found_needle: found bootmenu-jeos-20201204, similarity 1.00 @ 101/208
[2023-05-17T22:30:20.419694+02:00] [debug] [pid:78975] tests/installation/bootloader_uefi.pm:124 called bootloader_setup::uefi_bootmenu_params -> lib/bootloader_setup.pm:383 called testapi::send_key_until_needlematch
[2023-05-17T22:30:20.420160+02:00] [debug] [pid:78975] <<< testapi::check_screen(mustmatch="grub2-enter-edit-mode", timeout=0)
And this is a failed run:
[2023-05-18T22:31:23.112731+02:00] [debug] [pid:20579] <<< testapi::assert_screen(mustmatch=[
"bootloader-shim-import-prompt",
"bootloader-grub2"
], timeout=90)
[2023-05-18T22:31:24.904170+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.82 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:24.904637+02:00] [debug] [pid:21382] no match: 89.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:25.081871+02:00] [debug] [pid:21382] no change: 88.0s
[2023-05-18T22:31:25.580143+02:00] [debug] [pid:21382] no match: 88.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:26.094924+02:00] [debug] [pid:21382] no change: 87.0s
[2023-05-18T22:31:26.569793+02:00] [debug] [pid:21382] no match: 87.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:27.083393+02:00] [debug] [pid:21382] no change: 86.0s
[2023-05-18T22:31:27.553981+02:00] [debug] [pid:21382] no match: 86.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:35.276260+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.19 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:35.276731+02:00] [debug] [pid:21382] no match: 85.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:36.083159+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.78 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:36.083646+02:00] [debug] [pid:21382] no match: 77.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:36.279755+02:00] [debug] [pid:21382] no change: 76.8s
[2023-05-18T22:31:37.020384+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:37.020854+02:00] [debug] [pid:21382] no match: 76.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:44.628366+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.35 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:44.628834+02:00] [debug] [pid:21382] no match: 75.8s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:45.401250+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:45.401731+02:00] [debug] [pid:21382] no match: 68.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:46.386500+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:46.386994+02:00] [debug] [pid:21382] no match: 67.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:47.387270+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:47.387869+02:00] [debug] [pid:21382] no match: 66.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:55.130059+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.48 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:55.130529+02:00] [debug] [pid:21382] no match: 65.4s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:55.715449+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.56 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:55.715931+02:00] [debug] [pid:21382] no match: 57.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:31:56.143795+02:00] [debug] [pid:21382] no change: 56.9s
[2023-05-18T22:31:56.669176+02:00] [warn] [pid:21382] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.52 seconds for 14 candidate needles - make your needles more specific
[2023-05-18T22:31:56.669652+02:00] [debug] [pid:21382] no match: 56.9s, best candidate: bootloader_uefi-20210917 (0.00)
[2023-05-18T22:32:04.606465+02:00] [debug] [pid:20579] >>> testapi::_handle_found_needle: found bootmenu-jeos-20201204, similarity 1.00 @ 101/208
[2023-05-18T22:32:04.610142+02:00] [debug] [pid:20579] tests/installation/bootloader_uefi.pm:124 called bootloader_setup::uefi_bootmenu_params -> lib/bootloader_setup.pm:383 called testapi::send_key_until_needlematch
[2023-05-18T22:32:04.610684+02:00] [debug] [pid:20579] <<< testapi::check_screen(mustmatch="grub2-enter-edit-mode", timeout=0)
So, in the first case, the needle takes around 25s to be matched after the VM is started, but in the second case it takes ~45s, which makes the grub to pick the default option and boot the system without a chance to go to the grub2-enter-edit-mode
screen.
Updated by jlausuch over 1 year ago
I see 2 options:
1) Send a random key in uefi_bootmenu_params before send_key_until_needlematch('grub2-enter-edit-mode', 'e', 6, 0.5)
to stop the timeout
2) Clean-up some needles. Would be good to do it anyways, but this doesn't not guarantee 100% that it will happen again.
Updated by jlausuch over 1 year ago
Also happening in O3: https://openqa.opensuse.org/tests/3299645#step/bootloader_uefi/1
Updated by jlausuch over 1 year ago
- Status changed from New to In Progress
- Assignee set to jlausuch
Updated by jlausuch over 1 year ago
- Status changed from In Progress to Feedback
Let's see if this improves the situation
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17102
Updated by jlausuch over 1 year ago
So, I have observed that the failure rate is much lower, but still some failures from time to time:
https://openqa.suse.de/tests/11191129#
Updated by jlausuch over 1 year ago
Maybe this also helps: https://gitlab.suse.de/openqa/os-autoinst-needles-sles/-/merge_requests/1642
Updated by jlausuch over 1 year ago
Updated by dzedro about 1 year ago
I would use this as a workaround on uefi/aarch64 with small grub2 timeout.
Increase grub2 timeout after the qcow image is converted. https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L848
virt-edit image.qcow2 /boot/grub2/grub.cfg -e 's/timeout=10/timeout=60/'
Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.
Updated by szarate about 1 year ago
dzedro wrote:
I would use this as a workaround on uefi/aarch64 with small grub2 timeout.
Increase grub2 timeout after the qcow image is converted. https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L848
virt-edit image.qcow2 /boot/grub2/grub.cfg -e 's/timeout=10/timeout=60/'
Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.
in general I'd vouch for setting that thing to infinite... either way we can go with the workaround for now and take the time to fix it the right way, same as we did with SLES...
As in: Guarantee that SUT will have an infinite timeout
in general I think extracting host_config, might be the way to go, that way we avoid the issue of multiple kiwi files, or multiple workarounds instead of a proper solution
See discussion: https://suse.slack.com/archives/C02CANHLANP/p1692183993494469
Updated by jlausuch about 1 year ago
dzedro wrote in #note-14:
I would use this as a workaround on uefi/aarch64 with small grub2 timeout.
Increase grub2 timeout after the qcow image is converted. https://github.com/os-autoinst/os-autoinst/blob/master/backend/qemu.pm#L848
virt-edit image.qcow2 /boot/grub2/grub.cfg -e 's/timeout=10/timeout=60/'
+1000! That would give us some rest with restarting jobs, even if it's a workaround, it's a nice one.
Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.
This would be ideal but not possible in all cases, I remember some tests are expecting this screen on purpose to do some actions (I don't remember now of any concrete examples). I would prefer that the test owner has the choice of matching the grub screen or skip it (with KEEP_GRUB_TIMEOUT
var).
Updated by jlausuch about 1 year ago
Or don't complicate it and ignore grub2 if test reaches shell, but not sure how hard this can be done in existing code.
This job is an example of why we need to capture the grub screen so that we can modify some parameters.
Updated by okurz about 1 year ago
Installation tests should completely disable the grub timeout. Maybe that is lost from the schedule in some cases? For ready-created images the suggestion about changing in the qcow file sounds good
Updated by jlausuch about 1 year ago
okurz wrote in #note-19:
Installation tests should completely disable the grub timeout. Maybe that is lost from the schedule in some cases?
+1 SLE Micro jobs do that already.For ready-created images the suggestion about changing in the qcow file sounds good
Where do you think this could fit better? os-autoinst? or in the bootloader modules?
Updated by okurz about 1 year ago
jlausuch wrote in #note-20:
For ready-created images the suggestion about changing in the qcow file sounds good
Where do you think this could fit better? os-autoinst? or in the bootloader modules?
Either bootloader module or separate preparation module
Updated by jlausuch about 1 year ago
okurz wrote in #note-21:
jlausuch wrote in #note-20:
For ready-created images the suggestion about changing in the qcow file sounds good
Where do you think this could fit better? os-autoinst? or in the bootloader modules?Either bootloader module or separate preparation module
Ok, thanks for your input. I might come with some proposal.