Project

General

Profile

Actions

action #167383

closed

coordination #169270: [SLE-Micro][epic] Establish ppc64 test runs for SLEM6

test fails in disk_boot for toolbox

Added by mdati 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
2024-09-25
Due date:
% Done:

100%

Estimated time:

Description

Observation

openQA test in scenario sle-micro-6.1-Container-Image-Updates-x86_64-sle_micro_toolbox_image@64bit repeatedly failed in
disk_boot, see previuos runs.

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Fails since (at least) Build 6.1_3.6

Expected result

Last good: 6.1_3.5 (or more recent)

Further details

Always latest result in this scenario: latest

Note

On boot menu, key enter, to select first item, is missing.


Files

3-boot_ERRORc.png (12.3 KB) 3-boot_ERRORc.png Error device missing /dev/disk/* mdati, 2024-10-03 08:28

Related issues 1 (0 open1 closed)

Blocks Containers and images - action #162689: [sle-micro 6.1] add ppc64le image into testing queueResolvedmdati2024-06-21

Actions
Actions #1

Updated by mdati 3 months ago

  • Tags changed from firstboot to firstboot, toolbox
  • Project changed from openQA Tests (public) to Containers and images
  • Category deleted (Bugs in existing tests)

Note that, I paused the test for debugging in VNC, manual key return entered, then test run resumed: the test continued normally and passed: https://openqa.suse.de/tests/15527898.

But on next rerun it will fail, missing the key enter selection automation.

Actions #2

Updated by mdati 2 months ago · Edited

  • Tags changed from firstboot, toolbox to toolbox, boot
  • Subject changed from test fails in disk_boot for toolbox x86 to test fails in disk_boot for toolbox

In group SLE Micro Toolbox updates build 6.1_3.7,
the test on x86_64 then passed next run, a clone, paused and manually selected the right key;
on ppc64le still fail on [disk_boot][(https://openqa.suse.de/tests/15547875#step/disk_boot/2).

Actions #3

Updated by mdati 2 months ago

  • Related to action #162689: [sle-micro 6.1] add ppc64le image into testing queue added
Actions #4

Updated by mdati 2 months ago · Edited

After some investigation, it resulted that, with KEEP_GRUB_TIMEOUT=1, in [disk_boot] we skip grub checks and on boot menu it remains waiting for (manual) key-ret; but setting =0 the boot proceeds, on grub too.

MR 1844 of jlausuch, to change it in the template, merged.

But, another issue appeared during boot phase (VR 15581119), a device missing: device missing
Due to that, the system returns to the first boot menu, in loop: so, we still shall fix it, to let module pass.

Actions #5

Updated by mdati 2 months ago

  • Status changed from New to Workable
Actions #6

Updated by mdati 2 months ago

  • Related to deleted (action #162689: [sle-micro 6.1] add ppc64le image into testing queue)
Actions #7

Updated by mdati 2 months ago

  • Blocks action #162689: [sle-micro 6.1] add ppc64le image into testing queue added
Actions #8

Updated by mdati 2 months ago · Edited

The problem may be is in the SLEM 6.1 ppc64le [HDD_1 creation phase] or in the raw image itself.

All other architectures HDD_1 are ok, tests pass in SLE-M Toolbox group

Actions #9

Updated by mdati 2 months ago · Edited

  • Status changed from Workable to In Progress
  • Assignee set to mdati
Actions #10

Updated by ph03nix 2 months ago

I see that this is still failing in https://openqa.suse.de/tests/15616178#step/disk_boot/13 with "no suitable video mode found". This test run uses the SL-Micro.ppc64le-6.1-Default-Updated.qcow2 HDD image produced in https://openqa.suse.de/tests/15618478, which works fine. I wonder were the culprit is?

@mdati do you know more?

Actions #11

Updated by mdati 2 months ago · Edited

kind reply to https://progress.opensuse.org/issues/167383#note-10:

No, https://openqa.suse.de/tests/15618478, does NOT work fine: the HDD is only published ok.

As already explained in https://progress.opensuse.org/issues/167383#note-4, when that new HDD is used in Container-Image-Updates-ppc64le it fails in this step, that is device missing, but the error no suitable video mode found is "minor", not the blocking one, and self recovered: in fact the expected Grub menu NEWLY reappears and stop there, waiting for a next key-enter, never more occurring.
That minor error I saw occurred also in images that booted ok.

It is a loop:

  • test start with GRUB Menu,
  • auto key-enter,
  • boot phase proceeds, but then... Error device missing happens,
  • system fall back to GRUB menu, the minor error occurrs,
  • but then back GRUB Menu appears.

If you here re-click enter, that loop rerun.

Actions #12

Updated by mdati 2 months ago · Edited

Now, next step:
After I executed many test in gr. 451 on sle-m 6.1 ppc64le, publishing as qcow2 as raw.xz or using other images in test, but always a unbootable image published.
In order try to resume testing podman in Container Hosts group 513 and unblock poo #162689, as W.A. I'm (temporary) skipping that new qcow2 hdd creation and working directly on the ppc64 raw.xz images in the HDD_1, https://openqa.suse.de/tests/15647678 , but with proper pre-updates:
(a) I added CONTAINER_IMAGE_TO_TEST 6.1 to update the toolbox default 5.5, then
(b) I triggered the install_updates, (adding "-QR" to the flavor name), to add missiing certificates for pull registry.

But then a new issues appeared in image_podman https://openqa.suse.de/tests/15641792#step/image_podman/114 validation and I am testing some code changes, in https://openqa.suse.de/tests/15647678.

Actions #13

Updated by mdati 2 months ago · Edited

See note in resumed poo#162689: https://progress.opensuse.org/issues/162689#note-28 WIP; to update certificates in item (b), a different solution used than note-12 above.

Actions #14

Updated by ph03nix about 2 months ago

This ticket can IMHO be closed in favor of https://bugzilla.suse.com/show_bug.cgi?id=1227509.

Actions #15

Updated by mdati about 2 months ago · Edited

This issue was produced by some settings in HDD creation of tests gr.377, that passed, but made the ppc image no more bootable in test of gr.451.
Felix found the problematic settings is HDDSIZEGB and created MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1862 to remove it; Merged ok.
Also VR were done and passed ok, confirming parameters are valid: https://openqa.suse.de/tests/15702362

Actions #16

Updated by ph03nix about 2 months ago

  • Tags changed from toolbox, boot to containers
Actions #17

Updated by mdati about 2 months ago · Edited

Last SLE Micro Toolbox updates build 6.1_3.16 failed due to problematic settings.
But Felix MR https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/1874 fixed the issue HDDSIZEGB_1=''.
Actually toolbox tests pass ok for all the 4 achitectures

Actions #18

Updated by mdati about 2 months ago · Edited

  • Status changed from In Progress to Feedback

Based on this p.o.o scope for 6.1 ppc64le, now boot is ok, but even the full test pass ok.

Let we consider that it was decided to disable toolbox sle-m 6.1 pipeline: MR317.

Therefore no new 6.1 build will re-ran.

So, this ticket could be closed.

Actions #19

Updated by mdati about 2 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100
Actions #20

Updated by ph03nix about 1 month ago

  • Parent task set to #169270
Actions

Also available in: Atom PDF