action #42632
closed[functional][y][fast] test fails in bootloader_uefi - it doesn't get into GRUB - SUT already running while checking data integrity
0%
Description
Compare with last successful test run, we have now problem with bootloader. I can see that installer tries to coming up however, but it didn't make it to get into GRUB.
Stall was detected during assert_screen fail and # wait_serial expected: 'SysRq : Show Blocked State' found
Observation¶
openQA test in scenario sle-12-SP4-Server-DVD-aarch64-RAID0@aarch64 fails in
bootloader_uefi
Reproducible¶
Fails since (at least) Build 0426 (current job)
Expected result¶
Last good: 0421 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by mgriessmeier over 5 years ago
- Priority changed from Normal to Urgent
restarted old, lastgood job, but is still scheduled: https://openqa.suse.de/tests/2182897
raising priority
Updated by szarate over 5 years ago
The culprit of this, is kind of high load on this worker, basically got 50 jobs allocated at once causing higher load on these machines(https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=0426&groupid=139&arch=aarch64), which... well resulted in slower needle checking... However, changes in the product might also be part of the problem, will check further
On the other hand: Number of workers should be revisited, as the machine is already suffering a bit.
Updated by dheidler over 5 years ago
- Assignee set to JERiveraMoya
This is related to the new data_integrity test that steals us ~1 minute while the SUT is already running while we are calculating the ISO checksum.
This makes the bootloader timeout. See also the timestamps at the log and the video.
This also happens on x86_64: https://openqa.suse.de/tests/2178161#step/bootloader/37
Updated by szarate over 5 years ago
- Assignee changed from JERiveraMoya to szarate
That explains a lot more (why it didn't happen before)... Anyway, I think that calling freeze_vm within that data_integrity module, should do the trick and simply resume_vm after it has finished, but before that changes need to be done at os-autoinst level to support this, as freeze_vm is meant to be called from the post_fail_hook level...
Updated by okurz over 5 years ago
- Subject changed from test fails in bootloader_uefi - it doesn't get into GRUB to [functional][y][fast] test fails in bootloader_uefi - it doesn't get into GRUB - SUT already running while checking data integrity
- Due date set to 2018-10-23
- Target version set to Milestone 20
And when we are talking about changes to the backend I would go for making sure – what I expected – but of course: We should only trigger isosize and the data integrity module before starting the actual SUT.
I suggest @JERiveraMoya to take this ticket as he introduced the data integrity check. For a start we could remove the test module from the schedule again and then – less urgent – work on making sure that we can trigger some test/check steps before the SUT is started.
Updated by szarate over 5 years ago
@okurz I can submit the pr to enable freeze_vm to be called at any point. However if the test have DELAYED_START=1 then the qemu backend will simply not start the cpu (or rather it will be paused), making a call to resume_vm() to be required after the integrity check is done...
Updated by okurz over 5 years ago
- Related to action #41414: [functional][y][tools] Checksum of the images is verified on the worker side added
Updated by okurz over 5 years ago
- Status changed from New to Feedback
With https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5982 riafarov removed the test module from the schedule for now and we should rethink the process. Reopened #42632
Updated by okurz over 5 years ago
@szarate vm freeze will not help the bare metal tests which can be affected the same I guess
Updated by riafarov over 5 years ago
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5982 not to schedule test module for now.
Updated by okurz over 5 years ago
What we could do is to just schedule the test module in a specific testsuite which only checks the asset integrity. It will help to show if the medium is fine on the openqa instance itself and at least within the cache of one worker
Updated by szarate over 5 years ago
Pr is already merged (not deployed) https://github.com/os-autoinst/os-autoinst/pull/1043/files
Updated by szarate over 5 years ago
OK, so gonna go freeze_vm -> check integrity -> resume_vm, for qemu backends. Gonna use poo#41414 to define better what to do, since this could be done in the cache service (https://github.com/os-autoinst/openQA/pull/1783) and is an operation that can be executed before the tests are even initialized.
Updated by szarate over 5 years ago
- Status changed from Feedback to In Progress
Updated by szarate over 5 years ago
- Status changed from In Progress to Feedback
- Priority changed from Urgent to Normal
PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6008
Setting to priority Normal, since testing is not blocked anymore. Setting to feedback until os-autoinst is deployed
Updated by okurz over 5 years ago
I like your idea about to do it in caching better than with freezing the VM and only for some backends as suggested in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6008 .
Updated by szarate over 5 years ago
- Status changed from Feedback to Resolved
I call ticket resolved by riafarov on #note-10. Check poo#41414 for more details