action #33586
closed[sle][functional][hyperv][u][sporadic][medium] test fails in grub_test; stuck on reboot countdown
0%
Description
Observation¶
openQA test in scenario sle-15-Installer-DVD-x86_64-lvm+RAID1@svirt-hyperv fails in
grub_test
Perhaps increase the timeout could help.
The main goal for this ticket is to gather statistics, think of what to log to make investigation easier. Try out if increasing timeout helps.
Reproducible¶
Fails since (at least) Build 527.1 (current job)
Expected result¶
Last good: 522.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by michalnowak over 6 years ago
- Subject changed from [sle][functional] test fails in grub_test - timeout 11k seems to be not enought for hiperv on installation to [sle][functional][hyperv] test fails in grub_test; stuck on reboot countdown
The VM was probably unable to reboot. Last screen is on YaST reboot count down, where it got stuck for 3 * 180 seconds. No clue in logs.
Updated by okurz over 6 years ago
- Subject changed from [sle][functional][hyperv] test fails in grub_test; stuck on reboot countdown to [sle][functional][y][yast][hyperv][fast] test fails in grub_test; stuck on reboot countdown
- Due date set to 2018-03-27
- Target version set to Milestone 15
Updated by riafarov over 6 years ago
- Subject changed from [sle][functional][y][yast][hyperv][fast] test fails in grub_test; stuck on reboot countdown to [sle][functional][y][yast][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown
- Description updated (diff)
- Status changed from New to Workable
Updated by riafarov over 6 years ago
We have a lot of issues with hyperV recently, please communicate with mnowak.
Updated by riafarov over 6 years ago
- Subject changed from [sle][functional][y][yast][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown
I've removed yast tag, not sure how it's yast problem. Moving to common backlog.
Updated by SLindoMansilla over 6 years ago
- Status changed from Workable to In Progress
- Assignee set to SLindoMansilla
Updated by SLindoMansilla over 6 years ago
Run 10 jobs on OSD: (scheduled, waiting for them to finish)
- https://openqa.suse.de/tests/1567222
- https://openqa.suse.de/tests/1567221
- https://openqa.suse.de/tests/1567220
- https://openqa.suse.de/tests/1567219
- https://openqa.suse.de/tests/1567218
- https://openqa.suse.de/tests/1567217
- https://openqa.suse.de/tests/1567216
- https://openqa.suse.de/tests/1567215
- https://openqa.suse.de/tests/1567214
- https://openqa.suse.de/tests/1567213
Updated by SLindoMansilla over 6 years ago
3 of those jobs failed before on partitioning_raid because the expected item wasn't not selected. Restarting those three jobs to see if they pass or fail on grub_test.
- https://openqa.suse.de/tests/1572286
- https://openqa.suse.de/tests/1572287
- https://openqa.suse.de/tests/1572288
I think the problem on grub_test could be caused by missing keys as this also happened on those three.
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-03-27 to 2018-04-10
- Status changed from In Progress to Workable
- Assignee deleted (
SLindoMansilla)
unassigning from sergio since he's on vacation
next steps to do:
- check statistics of mentioned jobs
Updated by oorlov over 6 years ago
- Status changed from Workable to In Progress
- Assignee set to oorlov
Updated by oorlov over 6 years ago
- Subject changed from [sle][functional][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][sporadic][medium] test fails in grub_test; stuck on reboot countdown
- Status changed from In Progress to Workable
- Assignee deleted (
oorlov)
Updated by okurz over 6 years ago
- Subject changed from [sle][functional][hyperv][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][u][sporadic][medium] test fails in grub_test; stuck on reboot countdown
- Assignee set to oorlov
As discussed in "sprint planning meeting II" assigning back to oorlov, could be a good learning opportunity. Just gather information from jobs on our production instances, do not code anything locally, do not even try to clone anything or setup your own worker.
Updated by oorlov over 6 years ago
From 136 test runs:
- 2 failed on the 'grub_test' (1578836, 1578795);
- 14 failed earlier, on 'bootloader_hyperv';
- 12 failed earlier also, on 'partitioning_raid';
- 23 interrupted automatically after 3 hours of test run, also before the 'grub_test'
So, the result is 2 failed 'grub_test' on 87 runs.
Also, after investigating video and logs of failed tests, I noticed that the test actually fails on previous one (reboot_after_installation). It sends 'alt-o' to reboot the system but "The system will reboot now..." popup in not closed. Then after 10 seconds of waiting, 'grub_test' started, but the popup is still there.
So, my assumption that popup is stuck in the system itself as it has countdown timer but its value is not changed after 10 seconds (it stucks on '7' or on '8' sometimes).
In order to ensure that it is an issue of the system, but not our tests, I would add more logging and would send 'alt-o' several times with the appropriate validations in the 'reboot_after_installation' test to be sure, if the system is stuck or not.
Updated by SLindoMansilla over 6 years ago
Hi oorlov,
Our installation test suites always stop the timer using the button stop.
This is done to have enough time to gather logs before rebooting the system.
After gathering logs, we come back to the installation window and "press" the ok button to restart the system.
It is then expected that the counter is stopped. What is not expected is that the 'ok' button is not pressed. I assume sporadic missing keys (the hotkey 'alt-o' is not received by the SUT or is received to early when the graphic control is not yet ready.
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-04-10 to 2018-04-24
Suggestions¶
- Play around with increasing timeout and try to gather more statistics
- check if some missing keys are in the play
- keep TIMEOUT_SCALE in mind when trying to reproduce locally
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-04-24 to 2018-05-08
- Target version changed from Milestone 15 to Milestone 16
clarify with Michal how to continue here
Updated by michalnowak over 6 years ago
mgriessmeier wrote:
clarify with Michal how to continue here
We don't have enough data to root-cause this, and as we haven't seen it lately... I suggest you close this.
Updated by oorlov over 6 years ago
- Status changed from In Progress to Rejected
Closed as it was not reproduced for a long time.