Project

General

Profile

Actions

action #42863

closed

[sle][functional][u] test fails in first_boot - hard to see what is exactly the cause for timeout issue

Added by zluo about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA (private) - Milestone 23
Start date:
2018-10-24
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

compared with last successful test run, it shows grub menu to boot up, but first boot runs out of time.
It is hard to determine what the real cause for this failure. Checked other logs, no issue found. Maybe we need to increase timeout for wait_boot (aarch64)?

Observation

openQA test in scenario sle-12-SP4-Server-DVD-aarch64-minimal_x@aarch64 fails in
first_boot

Reproducible

Fails since (at least) Build 0435 (current job)

Expected result

Last good: 0432 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 4 (0 open4 closed)

Related to openQA Tests (public) - action #42038: [sle][functional[u] test fails in shutdown - add post_fail_hook for shutdown module if possibleResolvedzluo2018-10-05

Actions
Related to openQA Tests (public) - action #43064: [functional][u] test fails in boot_into_snapshot and reboot_gnome with encrypted setup - test not waiting long enough for shutdown/reboot until we end up in grub which shows in post_fail_hookResolvedmgriessmeier2018-10-30

Actions
Blocked by openQA Tests (public) - action #32683: [sle][functional][u][medium] Implement proper post_fail_hook for boot_to_desktopResolvedjorauch2018-03-02

Actions
Blocks openQA Tests (public) - action #35892: [qe-core][functional][hard] test fails in kdump_and_crash - improve bootup/shutdown debugging approachRejectedSLindoMansilla2018-05-04

Actions
Actions #1

Updated by szarate about 6 years ago

This is not related to bootloader changes for aarch64, the SUT is already booting.

If you check serial0.txt in the end...

[    0.005048] ITS@0x8080000: Unable to locate ITS domain handle
[    0.005087] ITS@0x8080000: Unable to locate ITS domain handle

Which could be already a starting point. If you look at the video, it stopped in the same screen for quite some time. and the SUT did not react to the test sending the esc key, (about 3 mins later)

Seems a bit sporadic, I would suggest to run the same scenario 100 times with memory dumps enabled... so that a developer could look at it, if it happens again...

Actions #2

Updated by zluo about 6 years ago

I found today that TW has same issue: https://openqa.opensuse.org/tests/789009#step/first_boot/1

Actions #3

Updated by michel_mno about 6 years ago

may be same problem with ppc64le snapshot 20181112: https://openqa.opensuse.org/tests/794116#step/first_boot/41

Actions #4

Updated by ggardet_arm about 6 years ago

Same problem (?) on TW aarch64 with gnome only (kde is fine): https://openqa.opensuse.org/tests/795069

Actions #5

Updated by okurz about 6 years ago

All the three last mentioned jobs seem to show the same problem but different to the original one: The getty login prompt shows up on the serial device (multiple times) so I suspect it's a different issue.

Actions #6

Updated by dimstar about 6 years ago

Very likely the abi break of mozjs60, that I 'cured' in openSUSE:Factory by rebuilding everything aginst mozjs60

Just executed, for ARM and PowerPC:

for ARCH in ARM PowerPC; do
 for pkg in $(dependson mozjs60); do
   osc rebuildpac openSUSE:Factory:${ARCH} $pkg -r standard;
 done;
done

both archs should thus have gnome-shell (and other consumers) rebuilt against mozjs60 with the new ABI

(I just triggered the same for libzypp, which also had an ABI break)

Actions #7

Updated by mloviska about 6 years ago

Here is a similar problem on aarch64, where the SUT seems to boot, but it does not reach the login prompt.
https://openqa.suse.de/tests/2261868#step/first_boot/6

Actions #8

Updated by okurz about 6 years ago

  • Priority changed from Normal to High
  • Target version set to Milestone 21

yes, that looks more like it

Actions #9

Updated by pvorel about 6 years ago

We have timeout problem on first_boot on LTP on ipmi backend:
https://openqa.suse.de/tests/2262596#step/first_boot/2
Any idea what that caused?
Problem was with GRUB_TIMEOUT setting first to 300 (good for IPMI) and even with 900.
But maybe #31375 is more related to our issue.

Actions #10

Updated by okurz about 6 years ago

@pvorel I think this is neither #42863, this ticket, nor #31375, rather its own, new thing. The 'ret' key in the grub screen does not have an effect. Better track as its own issue.

Actions #11

Updated by okurz about 6 years ago

  • Related to action #42038: [sle][functional[u] test fails in shutdown - add post_fail_hook for shutdown module if possible added
Actions #12

Updated by okurz about 6 years ago

  • Related to action #43064: [functional][u] test fails in boot_into_snapshot and reboot_gnome with encrypted setup - test not waiting long enough for shutdown/reboot until we end up in grub which shows in post_fail_hook added
Actions #13

Updated by okurz about 6 years ago

  • Blocked by action #32683: [sle][functional][u][medium] Implement proper post_fail_hook for boot_to_desktop added
Actions #14

Updated by okurz about 6 years ago

  • Status changed from New to Blocked
  • Assignee set to okurz
Actions #15

Updated by okurz about 6 years ago

  • Blocks action #35892: [qe-core][functional][hard] test fails in kdump_and_crash - improve bootup/shutdown debugging approach added
Actions #16

Updated by okurz about 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: minimal_x+uefi
https://openqa.suse.de/tests/2326139

Actions #17

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 21 to Milestone 22
Actions #18

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 22 to Milestone 23
Actions #19

Updated by okurz almost 6 years ago

  • Status changed from Blocked to Resolved

Not failing anymore but we have improved the generic post_fail_hook (see blocker)

https://openqa.suse.de/tests/2526251

Actions

Also available in: Atom PDF