Project

General

Profile

Actions

action #33586

closed

[sle][functional][hyperv][u][sporadic][medium] test fails in grub_test; stuck on reboot countdown

Added by JERiveraMoya over 6 years ago. Updated over 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Start date:
2018-03-21
Due date:
2018-05-08
% Done:

0%

Estimated time:
Difficulty:
medium

Description

Observation

openQA test in scenario sle-15-Installer-DVD-x86_64-lvm+RAID1@svirt-hyperv fails in
grub_test

Perhaps increase the timeout could help.

The main goal for this ticket is to gather statistics, think of what to log to make investigation easier. Try out if increasing timeout helps.

Reproducible

Fails since (at least) Build 527.1 (current job)

Expected result

Last good: 522.1 (or more recent)

Further details

Always latest result in this scenario: latest

Actions #1

Updated by michalnowak over 6 years ago

  • Subject changed from [sle][functional] test fails in grub_test - timeout 11k seems to be not enought for hiperv on installation to [sle][functional][hyperv] test fails in grub_test; stuck on reboot countdown

The VM was probably unable to reboot. Last screen is on YaST reboot count down, where it got stuck for 3 * 180 seconds. No clue in logs.

Actions #2

Updated by okurz over 6 years ago

  • Subject changed from [sle][functional][hyperv] test fails in grub_test; stuck on reboot countdown to [sle][functional][y][yast][hyperv][fast] test fails in grub_test; stuck on reboot countdown
  • Due date set to 2018-03-27
  • Target version set to Milestone 15
Actions #3

Updated by riafarov over 6 years ago

  • Subject changed from [sle][functional][y][yast][hyperv][fast] test fails in grub_test; stuck on reboot countdown to [sle][functional][y][yast][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by riafarov over 6 years ago

We have a lot of issues with hyperV recently, please communicate with mnowak.

Actions #5

Updated by riafarov over 6 years ago

  • Subject changed from [sle][functional][y][yast][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown

I've removed yast tag, not sure how it's yast problem. Moving to common backlog.

Actions #6

Updated by SLindoMansilla over 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to SLindoMansilla
Actions #8

Updated by SLindoMansilla over 6 years ago

3 of those jobs failed before on partitioning_raid because the expected item wasn't not selected. Restarting those three jobs to see if they pass or fail on grub_test.

I think the problem on grub_test could be caused by missing keys as this also happened on those three.

Actions #9

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-03-27 to 2018-04-10
  • Status changed from In Progress to Workable
  • Assignee deleted (SLindoMansilla)

unassigning from sergio since he's on vacation

next steps to do:

  • check statistics of mentioned jobs
Actions #10

Updated by oorlov over 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to oorlov
Actions #11

Updated by oorlov over 6 years ago

  • Subject changed from [sle][functional][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][sporadic][medium] test fails in grub_test; stuck on reboot countdown
  • Status changed from In Progress to Workable
  • Assignee deleted (oorlov)
Actions #12

Updated by cwh over 6 years ago

  • Difficulty set to medium
Actions #13

Updated by okurz over 6 years ago

  • Subject changed from [sle][functional][hyperv][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][u][sporadic][medium] test fails in grub_test; stuck on reboot countdown
  • Assignee set to oorlov

As discussed in "sprint planning meeting II" assigning back to oorlov, could be a good learning opportunity. Just gather information from jobs on our production instances, do not code anything locally, do not even try to clone anything or setup your own worker.

Actions #14

Updated by oorlov over 6 years ago

  • Status changed from Workable to In Progress
Actions #15

Updated by oorlov over 6 years ago

From 136 test runs:

  • 2 failed on the 'grub_test' (1578836, 1578795);
  • 14 failed earlier, on 'bootloader_hyperv';
  • 12 failed earlier also, on 'partitioning_raid';
  • 23 interrupted automatically after 3 hours of test run, also before the 'grub_test'

So, the result is 2 failed 'grub_test' on 87 runs.

Also, after investigating video and logs of failed tests, I noticed that the test actually fails on previous one (reboot_after_installation). It sends 'alt-o' to reboot the system but "The system will reboot now..." popup in not closed. Then after 10 seconds of waiting, 'grub_test' started, but the popup is still there.

So, my assumption that popup is stuck in the system itself as it has countdown timer but its value is not changed after 10 seconds (it stucks on '7' or on '8' sometimes).

In order to ensure that it is an issue of the system, but not our tests, I would add more logging and would send 'alt-o' several times with the appropriate validations in the 'reboot_after_installation' test to be sure, if the system is stuck or not.

Actions #16

Updated by SLindoMansilla over 6 years ago

Hi oorlov,

Our installation test suites always stop the timer using the button stop.
This is done to have enough time to gather logs before rebooting the system.
After gathering logs, we come back to the installation window and "press" the ok button to restart the system.

It is then expected that the counter is stopped. What is not expected is that the 'ok' button is not pressed. I assume sporadic missing keys (the hotkey 'alt-o' is not received by the SUT or is received to early when the graphic control is not yet ready.

Actions #17

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-04-10 to 2018-04-24

Suggestions

  • Play around with increasing timeout and try to gather more statistics
  • check if some missing keys are in the play
  • keep TIMEOUT_SCALE in mind when trying to reproduce locally
Actions #18

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-04-24 to 2018-05-08
  • Target version changed from Milestone 15 to Milestone 16

clarify with Michal how to continue here

Actions #19

Updated by michalnowak over 6 years ago

mgriessmeier wrote:

clarify with Michal how to continue here

We don't have enough data to root-cause this, and as we haven't seen it lately... I suggest you close this.

Actions #20

Updated by oorlov over 6 years ago

  • Status changed from In Progress to Rejected

Closed as it was not reproduced for a long time.

Actions

Also available in: Atom PDF