action #113447
closed[sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but openQA still expects the grub edit screen
0%
Description
Observation¶
openQA test in scenario sle-15-SP4-JeOS-for-kvm-and-xen-Updates-aarch64-jeos-ltp-syscalls@aarch64 fails in
bootloader_uefi as the system continues to boot but the test expects the grub edit screen.
A more recent failure is also
openQA test in scenario microos-Tumbleweed-MicroOS-Image-ContainerHost-aarch64-container-host@aarch64 fails in
disk_boot
Problem¶
openQA doesn't see GRUB - First only a black screen and then immediately the booting system. It's not clear if there is a connection or different delay in the VNC connection so that we're blind in the first screens, or if GRUB was skipped somehow.
Updated by jlausuch almost 3 years ago
- Tags set to bug
- Project changed from openQA Tests (public) to 208
- Category deleted (
Bugs in existing tests) - Status changed from New to Workable
- Priority changed from Urgent to Normal
Updated by favogt over 2 years ago
- Priority changed from Normal to Urgent
Some months ago this failure appears increasingly and tests have a <50% chance to succeed. This pretty much completely breaks the automated release of openSUSE Images :-/
Updated by mloviska over 2 years ago
Over here it looks like a problem of the resolution, at first it booted with https://openqa.opensuse.org/tests/2833249#step/bootloader_uefi/3, after rerun https://openqa.opensuse.org/tests/2833508#step/bootloader_uefi/3 it was fine.
The other problem points to the problem of either too many needles to be checked and maybe overload of the machine.
- https://openqa.opensuse.org/tests/2768612/logfile?filename=autoinst-log.txt
- https://openqa.suse.de/tests/9798733/logfile?filename=autoinst-log.txt
[2022-10-24T09:47:20.200442+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.84 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:20.200988+02:00] [debug] no match: 89.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:20.348237+02:00] [debug] no change: 88.0s
[2022-10-24T09:47:20.851836+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.50 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:20.852334+02:00] [debug] no match: 88.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:21.348842+02:00] [debug] no change: 87.0s
[2022-10-24T09:47:21.842001+02:00] [debug] no match: 87.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:22.357924+02:00] [debug] no change: 86.0s
[2022-10-24T09:47:22.857535+02:00] [debug] no match: 86.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:30.999404+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.65 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:30.999825+02:00] [debug] no match: 85.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:31.830439+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.81 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:31.830873+02:00] [debug] no match: 77.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:32.739715+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:32.740141+02:00] [debug] no match: 76.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:41.023929+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 8.02 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:41.024349+02:00] [debug] no match: 75.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:41.782163+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.72 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:41.782558+02:00] [debug] no match: 67.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:42.589760+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.55 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:42.590161+02:00] [debug] no match: 66.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:51.015102+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.98 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:51.015512+02:00] [debug] no match: 65.3s, best candidate: bootloader_uefi-20210917 (0.00)
Updated by mloviska over 2 years ago
Briefly checking the workers, resolution issue occurs on worker ip-10-252-32-90
Updated by slo-gin over 2 years ago
This ticket was set to Urgent priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
Updated by favogt over 2 years ago
mloviska wrote:
https://gitlab.suse.de/openqa/os-autoinst-needles-sles/-/merge_requests/1628
https://github.com/os-autoinst/os-autoinst-needles-opensuse/pull/785
Merged, but unfortunately the tests still fail most of the time.
Updated by ph03nix over 2 years ago
- Priority changed from Urgent to High
Lowering priority to high due to inactivity.
The issue is however still (sporadically) present, e.g. https://openqa.opensuse.org/tests/2833249#step/bootloader_uefi/9
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: rescue
https://openqa.opensuse.org/tests/2888244#step/bootloader_uefi/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by slo-gin over 2 years ago
This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
Updated by slo-gin about 2 years ago
This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
Updated by slo-gin about 2 years ago
This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
Updated by ilausuch almost 2 years ago
- Priority changed from High to Normal
We lower the priority to normal in sync with Jose
Updated by jlausuch almost 2 years ago
- Related to action #129601: bootloader_uefi fails too often in aarch64 added
Updated by ph03nix about 1 year ago
- Tags changed from bug to bug, need-info
- Subject changed from [qac][jeos][sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen to [needinfo][sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen
I don quiet understand this ticket and if it is still relevant. Need more info or someone to clarify.
Updated by favogt about 1 year ago
- Subject changed from [needinfo][sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen to [sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen
ph03nix wrote in #note-15:
I don quiet understand this ticket and if it is still relevant. Need more info or someone to clarify.
In the linked job history I don't actually see any failures of this kind.
Here'a recent example on o3: https://openqa.opensuse.org/tests/3909341#step/disk_boot/7
As you can see, openQA doesn't see grub2 at all, only black screen and the booting system. In the video, grub isn't visible either. It's not clear to me whether openQA was just very unlucky or grub was actually skipped somehow.
Updated by okurz about 1 year ago
Also judging from the original scenario, only adapted for SLE15-SP5, https://openqa.suse.de/tests/latest?arch=aarch64&distri=sle&flavor=JeOS-for-kvm-and-xen-Updates&machine=aarch64&test=jeos-ltp-syscalls&version=15-SP5#next_previous I can only find 20 jobs right now, no related failures. So not enough statistic to declare this fixed.
favogt wrote in #note-16:
Here'a recent example on o3: https://openqa.opensuse.org/tests/3909341#step/disk_boot/7
As you can see, openQA doesn't see grub2 at all, only black screen and the booting system. In the video, grub isn't visible either. It's not clear to me whether openQA was just very unlucky or grub was actually skipped somehow.
From logs:
[2024-02-01T12:11:21.073089Z] [debug] [pid:37283] ||| starting disk_boot tests/microos/disk_boot.pm
[2024-02-01T12:11:21.081872Z] [debug] [pid:37283] tests/microos/disk_boot.pm:29 called opensusebasetest::wait_boot -> products/microos/../../lib/opensusebasetest.pm:898 called opensusebasetest::handle_grub -> products/microos/../../lib/opensusebasetest.pm:683 called opensusebasetest::wait_grub -> products/microos/../../lib/opensusebasetest.pm:439 called testapi::assert_screen
[2024-02-01T12:11:21.082660Z] [debug] [pid:37283] <<< testapi::assert_screen(mustmatch=[
"bootloader-shim-import-prompt",
"grub2",
"inst-bootmenu"
], timeout=300)
[2024-02-01T12:11:28.107568Z] [warn] [pid:37425] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 6.05 seconds for 127 candidate needles - make your needles more specific
so 6 seconds can mean that we miss a grub screen but only if the grub menu has an enabled timeout which shouldn't happen. Isn't the image changed anymore to disable the grub timeout?
But the real problem is in 127 (!) candidate needles which should be improved. For example by deleting old/duplicate/unspecific needles, excluding irrelevant needles with according ENV flags and such.
Updated by ph03nix about 1 year ago
- Tags changed from bug, need-info to bug, sporadic
- Subject changed from [sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen to [sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but openQA still expects the grub edit screen
- Description updated (diff)
Updated by ph03nix about 1 year ago
I see we already applied a workaround for OSD in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/a116bdebd21094ed7dabe482eacc9dd541d7957d/tests/microos/disk_boot.pm#L20 - perhaps we just need to apply this to O3 as well?