Project

General

Profile

Actions

action #49751

closed

[functional][u] test fails in grub_test - isotovideo missed the boot screen while worker-host was likely under heavy load

Added by szarate over 5 years ago. Updated almost 5 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
Enhancement to existing tests
Target version:
SUSE QA (private) - Milestone 31
Start date:
2019-03-27
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-aarch64-textmode@aarch64 fails in
grub_test

Test suite description

Maintainer: okurz

Installation in textmode and selecting the textmode "desktop" during installation.

Acceptance criteria

  • AC1: The test module doesn't fail if we miss the grub or tianocore menu
  • AC2: There is better visualization for the reviewer if we miss one of those screens

Suggestions

  • Wrap that assert_screeen inside an eval so that if it fails, we can still "try" to know if the system already booted by checking if the login string is already present in the serial console, if it is... either softfail or fail the grub_test

early investigation

This is an easy one, as in the logs the following messages can be seen, so basically the SUT was able to boot but isotovideo was not fast enough...


[2019-03-27T08:17:11.067 UTC] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/grub_test.pm:73 called utils::assert_screen_with_soft_timeout
[2019-03-27T08:17:11.068 UTC] [debug] <<< testapi::check_screen(mustmatch='grub2', timeout=90)
[2019-03-27T08:17:12.871 UTC] [debug] WARNING: check_asserted_screen took 1.80 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:17:12.871 UTC] [debug] no match: 89.9s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:17:14.279 UTC] [debug] WARNING: check_asserted_screen took 1.33 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:17:14.279 UTC] [debug] no match: 88.1s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:17:14.310 UTC] [debug] no change: 86.7s
[2019-03-27T08:17:50.564 UTC] [debug] WARNING: check_asserted_screen took 35.28 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:17:50.565 UTC] [debug] no match: 85.7s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:17:50.624 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 35.7843849658966 seconds
[2019-03-27T08:18:13.902 UTC] [debug] WARNING: check_asserted_screen took 23.24 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:13.914 UTC] [debug] no match: 50.4s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:13.937 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 23.3489730358124 seconds
[2019-03-27T08:18:15.429 UTC] [debug] WARNING: check_asserted_screen took 1.45 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:15.429 UTC] [debug] no match: 27.0s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:38.475 UTC] [debug] WARNING: check_asserted_screen took 23.00 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:38.488 UTC] [debug] no match: 25.5s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:38.517 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 23.0587909221649 seconds
[2019-03-27T08:18:39.363 UTC] [debug] WARNING: check_asserted_screen took 0.80 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:18:39.364 UTC] [debug] no match: 2.5s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:18:39.491 UTC] [debug] no change: 1.5s
[2019-03-27T08:19:00.836 UTC] [debug] WARNING: check_asserted_screen took 20.34 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:19:00.849 UTC] [debug] no match: 0.5s, best candidate: bootmenu-SLE-HPC-20180603 (0.00)
[2019-03-27T08:19:00.863 UTC] [debug] WARNING: There is some problem with your environment, we detected a stall for 20.4782309532166 seconds
[2019-03-27T08:19:21.408 UTC] [debug] WARNING: check_asserted_screen took 20.52 seconds for 42 candidate needles - make your needles more specific
[2019-03-27T08:19:22.816 UTC] [debug] >>> testapi::_check_backend_response: match=grub2 timed out after 90 (check_screen)

Reproducible

Fails since (at least) Build 198.1 (current job)

Expected result

Last good: 196.1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Has duplicate openQA Tests (public) - action #51851: test fails in grub_testRejected2019-05-22

Actions
Actions #1

Updated by mgriessmeier over 5 years ago

  • Description updated (diff)
  • Category changed from Bugs in existing tests to Enhancement to existing tests
  • Status changed from New to Workable
  • Target version set to Milestone 24
Actions #3

Updated by jorauch over 5 years ago

  • Assignee set to jorauch

Taking over

Actions #4

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 24 to Milestone 25

move to M25

Actions #6

Updated by jorauch over 5 years ago

  • Status changed from Workable to Feedback
Actions #7

Updated by jorauch over 5 years ago

We have:
https://openqa.suse.de/tests/2901056#
as test for the PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7472

Apparently 'eval' does not stop the test from going to failed, maybe we should run a multi tag assert_screen and check with 'match_has_tag' where we are

Actions #8

Updated by SLindoMansilla over 5 years ago

  • Status changed from Feedback to Resolved

PR merged.

Verified on OSD: https://openqa.suse.de/tests/2911377

Actions #9

Updated by favogt over 5 years ago

Unfortunately this workaround breaks booting after a live installation:

https://openqa.opensuse.org/tests/940347

It also matches a not yet rebooted system.

Actions #10

Updated by SLindoMansilla over 5 years ago

  • Status changed from Resolved to Workable

jrauch, could you take a look?

Should it be easy fixable by only doing that is not live installation?

Actions #11

Updated by jorauch over 5 years ago

This is really strange, but imho thats not the fault of the workaround, as we should be way further at this point. Maybe we should fix the preceeding module?
I personally would not put this in a if not LIVECD as we would not fix the cause

Actions #12

Updated by SLindoMansilla over 5 years ago

Please, notice how this matched screenshot is wrong: https://openqa.opensuse.org/tests/940347#step/grub_test/1

This is not a booted textmode, this tty is shown because the system is shutting down. But, the workaround thinks that the SUT is booted.
So, your module left the system in a bad state for the next module.

Actions #13

Updated by SLindoMansilla over 5 years ago

Let's revert the PR to not block other teams. We should provide a fixed PR.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7520

Actions #14

Updated by SLindoMansilla over 5 years ago

Actions #15

Updated by okurz over 5 years ago

revert PR merged

Actions #16

Updated by jorauch over 5 years ago

SLindoMansilla wrote:

Let's revert the PR to not block other teams. We should provide a fixed PR.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7520

I think grub_test should not even need to handle the shutdown of the system, its job is to look for grub. One of the preconditions of this test is that we have a booting system, if we do not get it here we should ensure it gets it from the preceeding test

Actions #17

Updated by jorauch over 5 years ago

  • Assignee deleted (jorauch)

Unassigning due to vacation and the high state of this ticket.
I am still thinking that my workaround is legit, as in the failure cases the precondition of a shutdown system was not met

Actions #18

Updated by jorauch over 5 years ago

  • Assignee set to jorauch

Taking back over after vacation

Actions #19

Updated by SLindoMansilla over 5 years ago

As discussed time ago, even if the problem we are trying to fix was in the module grub_test, there are implicit requirements for resolving a ticket.
The test result cannot be worst after applying a fix (eg. PR). If a fix makes a previously working (eg. passed) scenario/test/module to fail, that other module has to be adapted for the fix, or the fix needs to be adapted for that module.

Actions #20

Updated by jorauch over 5 years ago

I would prefer setting this to blocked and create a ticket that ensures this module getting a shut-down system
Imho we should fix the shady behaviour instead of keeping a broken workflow that somehow works accidentally but is wrong.

Actions #21

Updated by SLindoMansilla over 5 years ago

I agree with this proposal, please create a ticket for fixing the previous module. Set this ticket a blocked by it (with proper link in comment and in section "related issues".

From my point of view, since it is a blocker of a ticket prioritized by our PO, it would automatically get the milestone and workable. Unless there is something to discuss.
You can confirm with PO before starting working on the ticket.

Actions #22

Updated by jorauch over 5 years ago

  • Copied to coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system added
Actions #23

Updated by jorauch over 5 years ago

  • Copied to deleted (coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system)
Actions #24

Updated by jorauch over 5 years ago

  • Blocked by coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system added
Actions #25

Updated by jorauch over 5 years ago

  • Status changed from Workable to Blocked

Blocked by: https://progress.opensuse.org/issues/53249 as discussed above

Actions #26

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 25 to Milestone 26
Actions #27

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 26 to Milestone 28
Actions #28

Updated by mgriessmeier almost 5 years ago

  • Target version changed from Milestone 28 to Milestone 31
Actions #29

Updated by mgriessmeier almost 5 years ago

  • Blocked by deleted (coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system)
Actions #30

Updated by mgriessmeier almost 5 years ago

  • Status changed from Blocked to Rejected
Actions

Also available in: Atom PDF