Project

General

Profile

Actions

action #42632

closed

[functional][y][fast] test fails in bootloader_uefi - it doesn't get into GRUB - SUT already running while checking data integrity

Added by zluo over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 20
Start date:
2018-10-17
Due date:
2018-10-23
% Done:

0%

Estimated time:
Difficulty:

Description

Compare with last successful test run, we have now problem with bootloader. I can see that installer tries to coming up however, but it didn't make it to get into GRUB.

Stall was detected during assert_screen fail and # wait_serial expected: 'SysRq : Show Blocked State' found

Observation

openQA test in scenario sle-12-SP4-Server-DVD-aarch64-RAID0@aarch64 fails in
bootloader_uefi

Reproducible

Fails since (at least) Build 0426 (current job)

Expected result

Last good: 0421 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #41414: [functional][y][tools] Checksum of the images is verified on the worker sideResolvedriafarov2016-08-222019-03-12

Actions
Actions #1

Updated by mgriessmeier over 5 years ago

  • Priority changed from Normal to Urgent

restarted old, lastgood job, but is still scheduled: https://openqa.suse.de/tests/2182897
raising priority

Actions #2

Updated by szarate over 5 years ago

The culprit of this, is kind of high load on this worker, basically got 50 jobs allocated at once causing higher load on these machines(https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=0426&groupid=139&arch=aarch64), which... well resulted in slower needle checking... However, changes in the product might also be part of the problem, will check further

On the other hand: Number of workers should be revisited, as the machine is already suffering a bit.

Actions #3

Updated by dheidler over 5 years ago

  • Assignee set to JERiveraMoya

This is related to the new data_integrity test that steals us ~1 minute while the SUT is already running while we are calculating the ISO checksum.
This makes the bootloader timeout. See also the timestamps at the log and the video.

This also happens on x86_64: https://openqa.suse.de/tests/2178161#step/bootloader/37

Actions #4

Updated by szarate over 5 years ago

  • Assignee changed from JERiveraMoya to szarate

That explains a lot more (why it didn't happen before)... Anyway, I think that calling freeze_vm within that data_integrity module, should do the trick and simply resume_vm after it has finished, but before that changes need to be done at os-autoinst level to support this, as freeze_vm is meant to be called from the post_fail_hook level...

Actions #5

Updated by okurz over 5 years ago

  • Subject changed from test fails in bootloader_uefi - it doesn't get into GRUB to [functional][y][fast] test fails in bootloader_uefi - it doesn't get into GRUB - SUT already running while checking data integrity
  • Due date set to 2018-10-23
  • Target version set to Milestone 20

And when we are talking about changes to the backend I would go for making sure – what I expected – but of course: We should only trigger isosize and the data integrity module before starting the actual SUT.

I suggest @JERiveraMoya to take this ticket as he introduced the data integrity check. For a start we could remove the test module from the schedule again and then – less urgent – work on making sure that we can trigger some test/check steps before the SUT is started.

Actions #6

Updated by szarate over 5 years ago

@okurz I can submit the pr to enable freeze_vm to be called at any point. However if the test have DELAYED_START=1 then the qemu backend will simply not start the cpu (or rather it will be paused), making a call to resume_vm() to be required after the integrity check is done...

Actions #7

Updated by okurz over 5 years ago

  • Related to action #41414: [functional][y][tools] Checksum of the images is verified on the worker side added
Actions #8

Updated by okurz over 5 years ago

  • Status changed from New to Feedback

With https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5982 riafarov removed the test module from the schedule for now and we should rethink the process. Reopened #42632

Actions #9

Updated by okurz over 5 years ago

@szarate vm freeze will not help the bare metal tests which can be affected the same I guess

Actions #11

Updated by okurz over 5 years ago

What we could do is to just schedule the test module in a specific testsuite which only checks the asset integrity. It will help to show if the medium is fine on the openqa instance itself and at least within the cache of one worker

Actions #12

Updated by szarate over 5 years ago

Actions #13

Updated by szarate over 5 years ago

OK, so gonna go freeze_vm -> check integrity -> resume_vm, for qemu backends. Gonna use poo#41414 to define better what to do, since this could be done in the cache service (https://github.com/os-autoinst/openQA/pull/1783) and is an operation that can be executed before the tests are even initialized.

Actions #14

Updated by szarate over 5 years ago

  • Status changed from Feedback to In Progress
Actions #15

Updated by szarate over 5 years ago

  • Status changed from In Progress to Feedback
  • Priority changed from Urgent to Normal

PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6008

Setting to priority Normal, since testing is not blocked anymore. Setting to feedback until os-autoinst is deployed

Actions #16

Updated by okurz over 5 years ago

I like your idea about to do it in caching better than with freezing the VM and only for some backends as suggested in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6008 .

Actions #17

Updated by szarate over 5 years ago

  • Status changed from Feedback to Resolved

I call ticket resolved by riafarov on #note-10. Check poo#41414 for more details

Actions

Also available in: Atom PDF