action #25658
closed[sle][functional][migration][opensuse][virtualization]Increase/disable timeout of initial grub menue to ensure tests do not miss it
Added by okurz about 7 years ago. Updated about 7 years ago.
0%
Description
Observation¶
There are many openQA tests in all different scenarios regardless of the worker or machine type which sometimes happen to skip over the grub menue and then fail in a fully booted desktop still looking for "grub2". Sometimes it shows up in the video for a glimpse, sometimes not.
Example: opensuse-5.10.90-Krypton-Live-x86_64-krypton-live-installation@64bit-2G fails in grub_test
Reproducible¶
All different kind of scenarios but not reproducible every time
Expected result¶
After the machine reboots the grub screen should be catched within the 8 second default timeout: https://openqa.opensuse.org/tests/494564#step/grub_test/1
Problem¶
It seems we can not ensure a reliable testing environment where the full screen content shows up in the early phases of boot. This is either a problem we always had but was not important enough to handle because of less test scenarios or there were changes in os-autoinst that introduced a regression.
Suggestion¶
In any case a fix to prevent this would be to configure the bootloader in the installer to use a much higher timeout, e.g. 60 seconds, or disable it fully.
- take a look in
tests/installation/disable_grub_graphics.pm
how it enters the bootloader configuration menue from the installer and disables the grub graphics (time estimation: 0.1 - 0.5h) - use the same approach in a new test module with explicit name, e.g. 'change_grub_timeout.pm' to bump the timeout to 60 seconds (or a similar high value) or disable the grub timeout (time estimation: 0.5 - 2h)
- make sure this test module is used in all relevant tests, e.g. all openSUSE+SLE installation tests, including the live cd (time estimation: 0.5 - 4h)
- changes to "first_boot" or the
wait_boot
function should not be necessary because the bootloader menue should then still show up and is searched for within the normal timeout but by bumping the grub timeout itself we should ensure the bootloader menue to show up long enough so that it can be catched
Further details¶
Always latest result in this scenario: latest
Updated by okurz about 7 years ago
- Related to action #12894: [sles][functional][ppc64] boot_to_snapshot sporadic fails in grub_test added
Updated by okurz about 7 years ago
- Blocked by action #17502: [sle][functional]test fails in grub_test on timeout waiting for inst-bootmenu added
Updated by okurz about 7 years ago
- Related to deleted (action #12894: [sles][functional][ppc64] boot_to_snapshot sporadic fails in grub_test)
Updated by okurz about 7 years ago
- Blocks action #12894: [sles][functional][ppc64] boot_to_snapshot sporadic fails in grub_test added
Updated by okurz about 7 years ago
- Blocked by deleted (action #17502: [sle][functional]test fails in grub_test on timeout waiting for inst-bootmenu )
Updated by okurz about 7 years ago
- Blocks action #17502: [sle][functional]test fails in grub_test on timeout waiting for inst-bootmenu added
Updated by okurz about 7 years ago
- Blocks action #18304: [sles][migration] test fails in grub_test_snapshot: Can't find desired snapshot in grub menu added
Updated by okurz about 7 years ago
- Blocks action #25388: [sle][functional][sle15][aarch64]test fails in grub_test added
Updated by okurz about 7 years ago
- Blocks action #25376: [sle][functional][ppc64le][hard] test fails in grub_test - stuck in grub screen, expected encrypted password prompt (was: boots completely) added
Updated by okurz about 7 years ago
- Due date set to 2017-10-25
- Target version set to Milestone 11
Updated by okurz about 7 years ago
- Blocks action #25664: [opensuse][qam]test fails to see grub menue in time in console_reboot even though it's visible but probably just a little bit too late added
Updated by okurz about 7 years ago
- Status changed from New to In Progress
- Assignee set to JERiveraMoya
JERiveraMoya partnering with SLindoMansilla
Updated by JERiveraMoya about 7 years ago
https://github.com/jknphy/os-autoinst-distri-opensuse/commit/780b957b90ad51d190e2949f22fe985d61ed9630
https://gitlab.suse.de/JERiveraMoya/os-autoinst-needles-sles/commit/82dad959a13b06477284a1a25e225c8425f65080
Need also needles for Tumbleweed, pending to send another PR to my fork and check with Sergio about these two above in my forks for being improved before to send to the official repo.
These two should cover scenarios in Text mode and Graphic for Tumbleweed. Successfully tested in my local four scenarios.
Updated by okurz about 7 years ago
feel free to provide a PR and mark the subject line as "[WIP]" to prevent it from being merged but invite for early feedback. Until then I will take a look into your github commit and comment there.
EDIT: hm, I thought I could comment on a commit as well but I can't. Please provide a PR then
Updated by okurz about 7 years ago
To make needling easier we should fully disable the timeout. Can you do that?
Updated by JERiveraMoya about 7 years ago
GRUB_TIMEOUT=-1 will cause the menu to be displayed until it is selected the boot entry manually.
Renamed tests and needles from change_grub_timeout to disable_grub_timeout.
This change does not make needling easier, given it is just type -1 instead of 60 seconds. Needles added just are need to ensure the path to the field where the value is typed (i.e.: correct tab selected, etc.)
Found missing needles in Tumbleweed text mode. Currently fixing it before sending [WIP] PR.
Updated by okurz about 7 years ago
great news. So good change. Feel free to create a PR anyway when it works for just one scenario. We will comment on the PR and I am sure there is at least something to change which will have an impact. Better to use fast feedback
Updated by JERiveraMoya about 7 years ago
Updated by JERiveraMoya about 7 years ago
Tested for Leap 42.3 Updates and Maintenance. New needles were created for Opensuse Leap. There are some subtle differences between Tumbleweed and Leap (i.e.: the lines to draw menu tabs have difference length, selected text has a square bigger in one than the other, etc.)
Updated by JERiveraMoya about 7 years ago
Added several improvements in the PRs based on review comments: syntax, better use of OpenQA libraries, visualization of the change with a screenshot, etc.
Updated by JERiveraMoya about 7 years ago
As discussed with osukup, taking a look for re-needling in Maintenance: Test repo job group.
Updated by okurz about 7 years ago
- Related to action #26116: [sle][functional] test fails in disable_grub_timeout - test flow seems wrong added
Updated by okurz about 7 years ago
jrivera: looks like leap 15.0 uses alt-l? https://openqa.opensuse.org/tests/506564#step/disable_grub_timeout/6
Updated by JERiveraMoya about 7 years ago
Fixed several issues for other products:
- Test included in more test scenarios.
- SLE 12 GA: Fixed problem related with the range accepted for Timeout [0,300]. Instead of -1, for this product version is typed 60 seconds.
- SLE 12 SP1: missing needles.
- Opensuse Leap: different shortcut to access tab & corresponding needles.
Updated by AndreasStieger about 7 years ago
But this seems to fail on Leap Maintenance?
disable_grub_timeout is failing in Leap Maintenance tests:
https://openqa.opensuse.org/tests/507382#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507383#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507384#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507385#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507386#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507387#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507391#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507392#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507393#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507394#step/disable_grub_timeout/4
https://openqa.opensuse.org/tests/507395#step/disable_grub_timeout/7
https://openqa.opensuse.org/tests/507396#step/disable_grub_timeout/7
Updated by okurz about 7 years ago
Looks like wrong hotkey for leap 42.3. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3742 should be able to handle this.verified
Updated by okurz about 7 years ago
- Related to action #26868: [sle][functional][hyperv]test fails in disable_grub_timeout, 'alt-t' was pressed when it should have been 'alt-r' added
Updated by okurz about 7 years ago
- Related to action #26898: [sle][functional][opensuse]boot_to_snapshot: test fails in first_boot added
Updated by JERiveraMoya about 7 years ago
New PR for fixing most of the found failures (hopefully): https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3749
Found today that sle-15 for aarch64 is 'alt + t': https://openqa.suse.de/tests/1224390#step/disable_grub_timeout/3
Updated by JERiveraMoya about 7 years ago
Last PR seems to work except for some needles in SLE-15 that are in different position in the screen, we might need another wait_still_screen(1) before sending key for selecting tab. BUT for leap 42.2 & 42.3 on machine 64bit-2G is not implemented the exception yet, so it is making fail a lot of tests, as expected. Added PR for workaround:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3755
Updated by SLindoMansilla about 7 years ago
PR to fix wrong hotkey on x86_64: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3760
Merged
Verified on OSD: https://openqa.suse.de/tests/1227894#step/disable_grub_timeout/4
Updated by SLindoMansilla about 7 years ago
- H1: REJECTED by E1-1 machines/workers with 2G, are causing the problem of different hotkey on module disable_grub_timeout.
- H2: Jobs without UEFI on Leap versions older than Leap 15 are causing the problem of different hotkey on module disable_grub_timeout.
H3: Jobs without UEFI on SLE versions older than 12-SP2 are also affected by the problem of different hotkey on module disable_grub_timeout.
E1-1: Find jobs with 2G machines where the module disable_grub_timeout works.
R1-1: works fine for the same build, version and arch for jobs with UEFI:
E2-1: Find all jobs with 2G machines where jobs haven't UEFI set and don't fail on the module disable_grub_timeout.
R2-1: Not found.
E2-2: Find all jobs with 2G machines where jobs haven't UEFI set and fail on the module disable_grub_timeout.
R2-2: All jobs that execute the module disable_grub_timeout and doesn't have UEFI fail:
E3-1: Find jobs with UEFI for SLE 12-SP1 that fails on module disable_grub_timeout.
Updated by SLindoMansilla about 7 years ago
PR to fix non-UEFI jobs for Leap 42.2: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3764
Closed.
Superseded by https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3762
Updated by okurz about 7 years ago
So the failures I found which I think are most critical is the part which you mentioned as "not implemented yet" for openSUSE Leap maintenance tests. This is something that should also be done on the weekend, maintenance updates never sleep, especially not on openSUSE ;-)
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3762 is a two-fold approach, adds an explicit assert_screen
to make sure the bootloader settings are shown before trying to press any key. Also, it accepts if the key to switch the tab could not be found and returns without failing.
Updated by okurz about 7 years ago
tests fixed for the time being with https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3765 by cancelling the configuration module when the hotkey did not match. I think a safer solution is to use send_key_until_needle_match 'installation-bootloader-options', 'tab'
` which should work in case of both ncurses as well as GUI.
Updated by okurz about 7 years ago
- Assignee changed from JERiveraMoya to okurz
@SLindoMansilla, @JERiveraMoya : There is one hypothesis which you have not mentioned. It seems that there actually is a differing button. I wonder if you have not seen this. But in the needles I created you can see this. Compare https://github.com/os-autoinst/os-autoinst-needles-opensuse/pull/282/files#diff-0a98fc599774cf73e5f165d6483189a5 and https://github.com/os-autoinst/os-autoinst-needles-opensuse/pull/282/files#diff-295cbb639bb910fede484a4855037ee2 . The hotkey on "bootloader options" changes because in the second screen there is an additional button in top-right about "release notes" taking the "alt-l".
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3762 and the according needle PRs covers this now.
Updated by JERiveraMoya about 7 years ago
Investigating why is only typing 0 instead of -1 (this issue makes grub_test to fail). Not reproducible in my local machine. Logs shows -1 is typed. It seems the typing speed.
Textbox only detects "-" and after "ret" comes back to "0".
https://openqa.suse.de/tests/1230788#step/disable_grub_timeout/5
https://openqa.suse.de/tests/1231104#step/disable_grub_timeout/8
It might help to use: type_string_slow. PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3777
Updated by okurz about 7 years ago
PR merged and active on both osd as well as o3. Now we seem to have the problem with the '0' sometimes so it seems we are not done yet. I will also try to run it more often locally.
type_string_slow would only wait between the characters and it's just two characters. Happened again in https://openqa.opensuse.org/tests/511736#step/disable_grub_timeout/8
I will try to reproduce locally as well. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3778 as a temporary workaround to disable again.
Updated by SLindoMansilla about 7 years ago
- Related to action #26936: disable_grub_timeout test broken added
Updated by okurz about 7 years ago
Could not reproduce the problem with "0 typing" locally in 40 runs so only my best guess stays that the production environment is different here and something like a wait_still_screen(1)
could help for production. This is included in my PR. I leave it to reviewers to decide about that.
Updated by JERiveraMoya about 7 years ago
Missing needles for SLE-12 GA: https://gitlab.suse.de/openqa/os-autoinst-needles-sles/merge_requests/544
Updated by okurz about 7 years ago
- Status changed from In Progress to Resolved
- Assignee changed from okurz to JERiveraMoya
I consider ourselves done here. The last change I did was add an additional wait_still_screen(1)
to fix a sporadic problem we could never see locally. https://openqa.suse.de/tests/1231772 is the verification on SLE15,
https://openqa.suse.de/tests/1232728#step/disable_grub_timeout/14 is one on SLE 12 SP2 maintenance tests. https://openqa.opensuse.org/tests/512274 is an openSUSE staging test. I am currently monitoring maintenance tests for any missing parts, e.g. missing needles or jobs that still need to be retriggered which I consider out of scope of this ticket so setting to resolved. Also setting assignee back to JERiveraMoya who did the main work here.
Updated by okurz about 7 years ago
- Related to action #26978: autoyast/gnome installation stuck on grub added