Project

General

Profile

Actions

action #113447

closed

[sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but openQA still expects the grub edit screen

Added by okurz almost 3 years ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
2022-07-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario sle-15-SP4-JeOS-for-kvm-and-xen-Updates-aarch64-jeos-ltp-syscalls@aarch64 fails in
bootloader_uefi as the system continues to boot but the test expects the grub edit screen.

A more recent failure is also

openQA test in scenario microos-Tumbleweed-MicroOS-Image-ContainerHost-aarch64-container-host@aarch64 fails in
disk_boot

Problem

openQA doesn't see GRUB - First only a black screen and then immediately the booting system. It's not clear if there is a connection or different delay in the VNC connection so that we're blind in the first screens, or if GRUB was skipped somehow.


Related issues 1 (0 open1 closed)

Related to openQA Tests (public) - action #129601: bootloader_uefi fails too often in aarch64Resolvedjlausuch2023-05-19

Actions
Actions #1

Updated by jlausuch almost 3 years ago

  • Tags set to bug
  • Project changed from openQA Tests (public) to 208
  • Category deleted (Bugs in existing tests)
  • Status changed from New to Workable
  • Priority changed from Urgent to Normal
Actions #2

Updated by favogt over 2 years ago

  • Priority changed from Normal to Urgent

Some months ago this failure appears increasingly and tests have a <50% chance to succeed. This pretty much completely breaks the automated release of openSUSE Images :-/

Actions #3

Updated by mloviska over 2 years ago

Over here it looks like a problem of the resolution, at first it booted with https://openqa.opensuse.org/tests/2833249#step/bootloader_uefi/3, after rerun https://openqa.opensuse.org/tests/2833508#step/bootloader_uefi/3 it was fine.

The other problem points to the problem of either too many needles to be checked and maybe overload of the machine.

[2022-10-24T09:47:20.200442+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.84 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:20.200988+02:00] [debug] no match: 89.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:20.348237+02:00] [debug] no change: 88.0s
[2022-10-24T09:47:20.851836+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.50 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:20.852334+02:00] [debug] no match: 88.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:21.348842+02:00] [debug] no change: 87.0s
[2022-10-24T09:47:21.842001+02:00] [debug] no match: 87.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:22.357924+02:00] [debug] no change: 86.0s
[2022-10-24T09:47:22.857535+02:00] [debug] no match: 86.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:30.999404+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.65 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:30.999825+02:00] [debug] no match: 85.0s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:31.830439+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.81 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:31.830873+02:00] [debug] no match: 77.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:32.739715+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.74 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:32.740141+02:00] [debug] no match: 76.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:41.023929+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 8.02 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:41.024349+02:00] [debug] no match: 75.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:41.782163+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.72 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:41.782558+02:00] [debug] no match: 67.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:42.589760+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 0.55 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:42.590161+02:00] [debug] no match: 66.3s, best candidate: bootloader_uefi-20210917 (0.00)
[2022-10-24T09:47:51.015102+02:00] [warn] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 7.98 seconds for 15 candidate needles - make your needles more specific
[2022-10-24T09:47:51.015512+02:00] [debug] no match: 65.3s, best candidate: bootloader_uefi-20210917 (0.00)
Actions #4

Updated by mloviska over 2 years ago

Briefly checking the workers, resolution issue occurs on worker ip-10-252-32-90

Actions #6

Updated by slo-gin over 2 years ago

This ticket was set to Urgent priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #8

Updated by ph03nix over 2 years ago

  • Priority changed from Urgent to High

Lowering priority to high due to inactivity.

The issue is however still (sporadically) present, e.g. https://openqa.opensuse.org/tests/2833249#step/bootloader_uefi/9

Actions #9

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: rescue
https://openqa.opensuse.org/tests/2888244#step/bootloader_uefi/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #10

Updated by slo-gin over 2 years ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #11

Updated by slo-gin about 2 years ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #12

Updated by slo-gin about 2 years ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #13

Updated by ilausuch almost 2 years ago

  • Priority changed from High to Normal

We lower the priority to normal in sync with Jose

Actions #14

Updated by jlausuch almost 2 years ago

  • Related to action #129601: bootloader_uefi fails too often in aarch64 added
Actions #15

Updated by ph03nix about 1 year ago

  • Tags changed from bug to bug, need-info
  • Subject changed from [qac][jeos][sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen to [needinfo][sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen

I don quiet understand this ticket and if it is still relevant. Need more info or someone to clarify.

Actions #16

Updated by favogt about 1 year ago

  • Subject changed from [needinfo][sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen to [sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen

ph03nix wrote in #note-15:

I don quiet understand this ticket and if it is still relevant. Need more info or someone to clarify.

In the linked job history I don't actually see any failures of this kind.

Here'a recent example on o3: https://openqa.opensuse.org/tests/3909341#step/disk_boot/7

As you can see, openQA doesn't see grub2 at all, only black screen and the booting system. In the video, grub isn't visible either. It's not clear to me whether openQA was just very unlucky or grub was actually skipped somehow.

Actions #17

Updated by okurz about 1 year ago

Also judging from the original scenario, only adapted for SLE15-SP5, https://openqa.suse.de/tests/latest?arch=aarch64&distri=sle&flavor=JeOS-for-kvm-and-xen-Updates&machine=aarch64&test=jeos-ltp-syscalls&version=15-SP5#next_previous I can only find 20 jobs right now, no related failures. So not enough statistic to declare this fixed.

favogt wrote in #note-16:

Here'a recent example on o3: https://openqa.opensuse.org/tests/3909341#step/disk_boot/7

As you can see, openQA doesn't see grub2 at all, only black screen and the booting system. In the video, grub isn't visible either. It's not clear to me whether openQA was just very unlucky or grub was actually skipped somehow.

From logs:



[2024-02-01T12:11:21.073089Z] [debug] [pid:37283] ||| starting disk_boot tests/microos/disk_boot.pm
[2024-02-01T12:11:21.081872Z] [debug] [pid:37283] tests/microos/disk_boot.pm:29 called opensusebasetest::wait_boot -> products/microos/../../lib/opensusebasetest.pm:898 called opensusebasetest::handle_grub -> products/microos/../../lib/opensusebasetest.pm:683 called opensusebasetest::wait_grub -> products/microos/../../lib/opensusebasetest.pm:439 called testapi::assert_screen
[2024-02-01T12:11:21.082660Z] [debug] [pid:37283] <<< testapi::assert_screen(mustmatch=[
    "bootloader-shim-import-prompt",
    "grub2",
    "inst-bootmenu"
  ], timeout=300)
[2024-02-01T12:11:28.107568Z] [warn] [pid:37425] !!! backend::baseclass::check_asserted_screen: check_asserted_screen took 6.05 seconds for 127 candidate needles - make your needles more specific

so 6 seconds can mean that we miss a grub screen but only if the grub menu has an enabled timeout which shouldn't happen. Isn't the image changed anymore to disable the grub timeout?

But the real problem is in 127 (!) candidate needles which should be improved. For example by deleting old/duplicate/unspecific needles, excluding irrelevant needles with according ENV flags and such.

Actions #18

Updated by ph03nix about 1 year ago

  • Tags changed from bug, need-info to bug, sporadic
  • Subject changed from [sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but the test expects the grub edit screen to [sporadic][aarch64] test fails in bootloader_uefi as the system continues to boot but openQA still expects the grub edit screen
  • Description updated (diff)
Actions #19

Updated by ph03nix about 1 year ago

I see we already applied a workaround for OSD in https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/a116bdebd21094ed7dabe482eacc9dd541d7957d/tests/microos/disk_boot.pm#L20 - perhaps we just need to apply this to O3 as well?

Actions #20

Updated by ph03nix 7 months ago

  • Status changed from Workable to Closed

Closing as outdated.

Actions #21

Updated by ph03nix 6 months ago

  • Tags changed from bug, sporadic to MinimalVM
Actions #22

Updated by ph03nix 6 months ago

  • Project changed from 208 to Containers and images
Actions

Also available in: Atom PDF