action #49751
closed[functional][u] test fails in grub_test - isotovideo missed the boot screen while worker-host was likely under heavy load
0%
Description
Observation¶
openQA test in scenario sle-15-SP1-Installer-DVD-aarch64-textmode@aarch64 fails in
grub_test
Test suite description¶
Maintainer: okurz
Installation in textmode and selecting the textmode "desktop" during installation.
Acceptance criteria¶
- AC1: The test module doesn't fail if we miss the grub or tianocore menu
- AC2: There is better visualization for the reviewer if we miss one of those screens
Suggestions¶
- Wrap that assert_screeen inside an eval so that if it fails, we can still "try" to know if the system already booted by checking if the login string is already present in the serial console, if it is... either softfail or fail the grub_test
early investigation¶
This is an easy one, as in the logs the following messages can be seen, so basically the SUT was able to boot but isotovideo was not fast enough...
[2019-03-27T08:17:11.067 UTC] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/grub_test.pm:73 called utils::assert_screen_with_soft_timeout
[2019-03-27T08:17:11.068 UTC] [debug] <<< testapi::check_screen(mustmatch='grub2', timeout=90)
[2019-03-27T08:17:12.871 UTC] [debug] WARNING: check_asserted_screen took 1.80 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:17:12.871 UTC] [debug] no match: 89.9s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:17:14.279 UTC] [debug] WARNING: check_asserted_screen took 1.33 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:17:14.279 UTC] [debug] no match: 88.1s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:17:14.310 UTC] [debug] no change: 86.7s
[2019-03-27T08:17:50.564 UTC] [debug] WARNING: check_asserted_screen took 35.28 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:17:50.565 UTC] [debug] no match: 85.7s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:17:50.624 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 35.7843849658966 seconds
[2019-03-27T08:18:13.902 UTC] [debug] WARNING: check_asserted_screen took 23.24 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:13.914 UTC] [debug] no match: 50.4s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:13.937 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 23.3489730358124 seconds
[2019-03-27T08:18:15.429 UTC] [debug] WARNING: check_asserted_screen took 1.45 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:15.429 UTC] [debug] no match: 27.0s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:38.475 UTC] [debug] WARNING: check_asserted_screen took 23.00 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:38.488 UTC] [debug] no match: 25.5s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:38.517 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 23.0587909221649 seconds
[2019-03-27T08:18:39.363 UTC] [debug] WARNING: check_asserted_screen took 0.80 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:39.364 UTC] [debug] no match: 2.5s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:39.491 UTC] [debug] no change: 1.5s
[2019-03-27T08:19:00.836 UTC] [debug] WARNING: check_asserted_screen took 20.34 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:19:00.849 UTC] [debug] no match: 0.5s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:19:00.863 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 20.4782309532166 seconds
[2019-03-27T08:19:21.408 UTC] [debug] WARNING: check_asserted_screen took 20.52 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:19:22.816 UTC] [debug] >>> testapi::_check_backend_response: match=grub2 timed out after 90 (check_screen)
Reproducible¶
Fails since (at least) Build 198.1 (current job)
Expected result¶
Last good: 196.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by mgriessmeier over 5 years ago
- Description updated (diff)
- Category changed from Bugs in existing tests to Enhancement to existing tests
- Status changed from New to Workable
- Target version set to Milestone 24
Updated by szarate over 5 years ago
Another example: https://openqa.suse.de/tests/2843289#step/reboot_gnome/11
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 24 to Milestone 25
move to M25
Updated by jorauch over 5 years ago
Updated by jorauch over 5 years ago
We have:
https://openqa.suse.de/tests/2901056#
as test for the PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7472
Apparently 'eval' does not stop the test from going to failed, maybe we should run a multi tag assert_screen and check with 'match_has_tag' where we are
Updated by SLindoMansilla over 5 years ago
- Status changed from Feedback to Resolved
PR merged.
Verified on OSD: https://openqa.suse.de/tests/2911377
Updated by favogt over 5 years ago
Unfortunately this workaround breaks booting after a live installation:
https://openqa.opensuse.org/tests/940347
It also matches a not yet rebooted system.
Updated by SLindoMansilla over 5 years ago
- Status changed from Resolved to Workable
jrauch, could you take a look?
Should it be easy fixable by only doing that is not live installation?
Updated by jorauch over 5 years ago
This is really strange, but imho thats not the fault of the workaround, as we should be way further at this point. Maybe we should fix the preceeding module?
I personally would not put this in a if not LIVECD
as we would not fix the cause
Updated by SLindoMansilla over 5 years ago
Please, notice how this matched screenshot is wrong: https://openqa.opensuse.org/tests/940347#step/grub_test/1
This is not a booted textmode, this tty is shown because the system is shutting down. But, the workaround thinks that the SUT is booted.
So, your module left the system in a bad state for the next module.
Updated by SLindoMansilla over 5 years ago
Let's revert the PR to not block other teams. We should provide a fixed PR.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7520
Updated by SLindoMansilla over 5 years ago
- Has duplicate action #51851: test fails in grub_test added
Updated by jorauch over 5 years ago
SLindoMansilla wrote:
Let's revert the PR to not block other teams. We should provide a fixed PR.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7520
I think grub_test should not even need to handle the shutdown of the system, its job is to look for grub. One of the preconditions of this test is that we have a booting system, if we do not get it here we should ensure it gets it from the preceeding test
Updated by jorauch over 5 years ago
- Assignee deleted (
jorauch)
Unassigning due to vacation and the high state of this ticket.
I am still thinking that my workaround is legit, as in the failure cases the precondition of a shutdown system was not met
Updated by SLindoMansilla over 5 years ago
As discussed time ago, even if the problem we are trying to fix was in the module grub_test, there are implicit requirements for resolving a ticket.
The test result cannot be worst after applying a fix (eg. PR). If a fix makes a previously working (eg. passed) scenario/test/module to fail, that other module has to be adapted for the fix, or the fix needs to be adapted for that module.
Updated by jorauch over 5 years ago
I would prefer setting this to blocked and create a ticket that ensures this module getting a shut-down system
Imho we should fix the shady behaviour instead of keeping a broken workflow that somehow works accidentally but is wrong.
Updated by SLindoMansilla over 5 years ago
I agree with this proposal, please create a ticket for fixing the previous module. Set this ticket a blocked by it (with proper link in comment and in section "related issues".
From my point of view, since it is a blocker of a ticket prioritized by our PO, it would automatically get the milestone and workable. Unless there is something to discuss.
You can confirm with PO before starting working on the ticket.
Updated by jorauch over 5 years ago
- Copied to coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system added
Updated by jorauch over 5 years ago
- Copied to deleted (coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system)
Updated by jorauch over 5 years ago
- Blocked by coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system added
Updated by jorauch over 5 years ago
- Status changed from Workable to Blocked
Blocked by: https://progress.opensuse.org/issues/53249 as discussed above
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 25 to Milestone 26
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 26 to Milestone 28
Updated by mgriessmeier almost 5 years ago
- Target version changed from Milestone 28 to Milestone 31
Updated by mgriessmeier almost 5 years ago
- Blocked by deleted (coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system)
Updated by mgriessmeier almost 5 years ago
- Status changed from Blocked to Rejected