Project

General

Profile

action #33586

[sle][functional][hyperv][u][sporadic][medium] test fails in grub_test; stuck on reboot countdown

Added by JERiveraMoya about 2 years ago. Updated about 2 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Start date:
2018-03-21
Due date:
2018-05-08
% Done:

0%

Estimated time:
Difficulty:
medium
Duration: 35

Description

Observation

openQA test in scenario sle-15-Installer-DVD-x86_64-lvm+RAID1@svirt-hyperv fails in
grub_test

Perhaps increase the timeout could help.

The main goal for this ticket is to gather statistics, think of what to log to make investigation easier. Try out if increasing timeout helps.

Reproducible

Fails since (at least) Build 527.1 (current job)

Expected result

Last good: 522.1 (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by michalnowak about 2 years ago

  • Subject changed from [sle][functional] test fails in grub_test - timeout 11k seems to be not enought for hiperv on installation to [sle][functional][hyperv] test fails in grub_test; stuck on reboot countdown

The VM was probably unable to reboot. Last screen is on YaST reboot count down, where it got stuck for 3 * 180 seconds. No clue in logs.

#2 Updated by okurz about 2 years ago

  • Subject changed from [sle][functional][hyperv] test fails in grub_test; stuck on reboot countdown to [sle][functional][y][yast][hyperv][fast] test fails in grub_test; stuck on reboot countdown
  • Due date set to 2018-03-27
  • Target version set to Milestone 15

#3 Updated by riafarov about 2 years ago

  • Subject changed from [sle][functional][y][yast][hyperv][fast] test fails in grub_test; stuck on reboot countdown to [sle][functional][y][yast][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown
  • Description updated (diff)
  • Status changed from New to Workable

#4 Updated by riafarov about 2 years ago

We have a lot of issues with hyperV recently, please communicate with mnowak.

#5 Updated by riafarov about 2 years ago

  • Subject changed from [sle][functional][y][yast][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown

I've removed yast tag, not sure how it's yast problem. Moving to common backlog.

#6 Updated by SLindoMansilla about 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to SLindoMansilla

#8 Updated by SLindoMansilla about 2 years ago

3 of those jobs failed before on partitioning_raid because the expected item wasn't not selected. Restarting those three jobs to see if they pass or fail on grub_test.

I think the problem on grub_test could be caused by missing keys as this also happened on those three.

#9 Updated by mgriessmeier about 2 years ago

  • Due date changed from 2018-03-27 to 2018-04-10
  • Status changed from In Progress to Workable
  • Assignee deleted (SLindoMansilla)

unassigning from sergio since he's on vacation

next steps to do:

  • check statistics of mentioned jobs

#10 Updated by oorlov about 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to oorlov

#11 Updated by oorlov about 2 years ago

  • Subject changed from [sle][functional][hyperv][fast][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][sporadic][medium] test fails in grub_test; stuck on reboot countdown
  • Status changed from In Progress to Workable
  • Assignee deleted (oorlov)

#12 Updated by cwh about 2 years ago

  • Difficulty set to medium

#13 Updated by okurz about 2 years ago

  • Subject changed from [sle][functional][hyperv][sporadic][medium] test fails in grub_test; stuck on reboot countdown to [sle][functional][hyperv][u][sporadic][medium] test fails in grub_test; stuck on reboot countdown
  • Assignee set to oorlov

As discussed in "sprint planning meeting II" assigning back to oorlov, could be a good learning opportunity. Just gather information from jobs on our production instances, do not code anything locally, do not even try to clone anything or setup your own worker.

#14 Updated by oorlov about 2 years ago

  • Status changed from Workable to In Progress

#15 Updated by oorlov about 2 years ago

From 136 test runs:

  • 2 failed on the 'grub_test' (1578836, 1578795);
  • 14 failed earlier, on 'bootloader_hyperv';
  • 12 failed earlier also, on 'partitioning_raid';
  • 23 interrupted automatically after 3 hours of test run, also before the 'grub_test'

So, the result is 2 failed 'grub_test' on 87 runs.

Also, after investigating video and logs of failed tests, I noticed that the test actually fails on previous one (reboot_after_installation). It sends 'alt-o' to reboot the system but "The system will reboot now..." popup in not closed. Then after 10 seconds of waiting, 'grub_test' started, but the popup is still there.

So, my assumption that popup is stuck in the system itself as it has countdown timer but its value is not changed after 10 seconds (it stucks on '7' or on '8' sometimes).

In order to ensure that it is an issue of the system, but not our tests, I would add more logging and would send 'alt-o' several times with the appropriate validations in the 'reboot_after_installation' test to be sure, if the system is stuck or not.

#16 Updated by SLindoMansilla about 2 years ago

Hi oorlov,

Our installation test suites always stop the timer using the button stop.
This is done to have enough time to gather logs before rebooting the system.
After gathering logs, we come back to the installation window and "press" the ok button to restart the system.

It is then expected that the counter is stopped. What is not expected is that the 'ok' button is not pressed. I assume sporadic missing keys (the hotkey 'alt-o' is not received by the SUT or is received to early when the graphic control is not yet ready.

#17 Updated by mgriessmeier about 2 years ago

  • Due date changed from 2018-04-10 to 2018-04-24

Suggestions

  • Play around with increasing timeout and try to gather more statistics
  • check if some missing keys are in the play
  • keep TIMEOUT_SCALE in mind when trying to reproduce locally

#18 Updated by mgriessmeier about 2 years ago

  • Due date changed from 2018-04-24 to 2018-05-08
  • Target version changed from Milestone 15 to Milestone 16

clarify with Michal how to continue here

#19 Updated by michalnowak about 2 years ago

mgriessmeier wrote:

clarify with Michal how to continue here

We don't have enough data to root-cause this, and as we haven't seen it lately... I suggest you close this.

#20 Updated by oorlov about 2 years ago

  • Status changed from In Progress to Rejected

Closed as it was not reproduced for a long time.

Also available in: Atom PDF