Project

General

Profile

Actions

action #167024

closed

coordination #163919: [epic] Create automation setup for testing Agama

Make stable agama in ppc64le by processing grub2 after installation finishes and reboot occurs

Added by JERiveraMoya 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
2024-09-19
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Once the system is installed with agama, in slower architecture the screen is not drawn (fully or partially) on time and the 10 seconds that we have to interact with grub2 run out and the test fails sporadically, ie:
https://openqa.opensuse.org/tests/4486200#step/grub_test/2

With the openSUSE/SLES installer we solve this situation for automation by disabling the grub timeout in the UI https://openqa.opensuse.org/tests/4475892#step/disable_grub_timeout/20 but agama doesn't have this functionality (yet?).

In the same fashion that QE Yam create its own module to boot agama in quemu we could extract the code that is useful from method grub_test in grub_utils.pm and create a simplified version (we can call it agama_grub2.pm), easier to maintain and which might include some temporary adjustments to avoid the problem described above.

A small experiment trying to intercept the black screen succeeded, this is the extract of the code:

assert_screen([qw(grub2 grub2-black-screen)], $timeout);
    if (match_has_tag "grub2-black-screen") {
        for (1 .. 9) {
            send_key("up");
            last if check_screen("grub2", 0);
            sleep 0.5;
        }
    }

https://openqa.opensuse.org/tests/overview?version=agama-9.0&distri=opensuse&build=jknphy%2Fos-autoinst-distri-opensuse%23stopgrubtimeout

Acceptance criteria

  • AC1: Make stable reaching the grub screen after agama installs the system in ppc64le.
  • AC2: Solution should not be included in the shared code, but encapsulated in a new test module, not breaking any other jobs.
Actions #1

Updated by JERiveraMoya 6 months ago

  • Subject changed from Make stable reaching the grub screen after agama installs the system in ppc64le to Make stable agama in ppc64le by processing grub2 after installation finishes and reboot occurs
Actions #2

Updated by hjluo 6 months ago

  • Status changed from Workable to In Progress
  • Assignee set to hjluo
Actions #3

Updated by JERiveraMoya 6 months ago

  • Description updated (diff)

In this ticket I don't recommend to change existing code for grub_test, I find it really complicated to work with something like that where you will break most of the jobs in openQA if you didn't succeed. I will add a requirement that this needs to be encapsulated in a new module to not break anything.

Actions #5

Updated by hjluo 6 months ago

PR was merged.

Actions #6

Updated by hjluo 6 months ago

  • Rerun 23.2 to verify.
Actions #7

Updated by JERiveraMoya 6 months ago

hjluo wrote in #note-6:

  • Rerun 23.2 to verify.

I don't think we should run this in other architecture different than ppc64, I can see that the black screen match in x86_64 before the system is shutting down and in s390x will break after passing the installation. You need to create POMs similar to what is in boot_agama to handle this without using conditions.

Actions #8

Updated by hjluo 6 months ago

OK, I'd draft another PR for POM it.

Actions #9

Updated by JERiveraMoya 6 months ago · Edited

  • Tags changed from qe-yam-sep-sprint to qe-yam-oct-sprint

Agreed to create new Page Object for grub page with couple of methods to check the screen is visible and enter the selected menu.
Should run only in ppc64le. Contact @jfernandez or @leli.
Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.

Actions #10

Updated by hjluo 6 months ago · Edited

Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.

Actions #11

Updated by JERiveraMoya 6 months ago

hjluo wrote in #note-10:

Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.

8GiB of RAM is quite a lot, there is probably some issue somewhere, but you did a good finding, if we experience the same in OSD we can file a bug, I'm not really sure about workers in openSUSE for power, also it is kind of old virtualization.
Might I ask you to try 20 runs with these two increments?
(1) 20x: (4096 + 1024)
(2) 20x: (4096 + 2048) (only if the one above doesn't work).

Finally to resolve the ticket we need to remove the workaround and also the schedule file schedule/yam/agama_ppc64le.yaml and adjust all agama job groups where is present (O3, OSD).

Actions #12

Updated by JERiveraMoya 6 months ago · Edited

JERiveraMoya wrote in #note-11:

hjluo wrote in #note-10:

Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.

8GiB of RAM is quite a lot (I didn't notice we already were on 4GB when I suggested to double it), TW can be installed with 4GB, so there is probably some issue somewhere, but you did a good finding, if we experience the same in OSD we can file a bug, I'm not really sure about workers in openSUSE for power, also it is kind of old virtualization.
Might I ask you to try 20 runs with these two increments to minimize the bumping?
(1) 20x: (4096 + 1024)
(2) 20x: (4096 + 2048) (only if the one above doesn't work).

Finally to resolve the ticket we need to remove the workaround and also the schedule file schedule/yam/agama_ppc64le.yaml and adjust all agama job groups where is present (O3, OSD).

Actions #13

Updated by hjluo 6 months ago · Edited

Actions #14

Updated by hjluo 6 months ago

Actions #15

Updated by JERiveraMoya 6 months ago · Edited

hjluo wrote in #note-14:

keeps failing, let's try then:

  • [**20x: (4096 + 3*1024)
  • [**20x: (4096 + 4*1024) -> that is exactly double, what you tried but I would like to see it again with more runs (and more chances other workers will be picked) to be sure.
Actions #16

Updated by hjluo 6 months ago · Edited

JERiveraMoya wrote in #note-15:

hjluo wrote in #note-14:

keeps failing, let's try then:

  • [**20x: (4096 + 3*1024)
  • [**20x: (4096 + 4*1024) -> that is exactly double, what you tried but I would like to see it again with more runs (and more chances other workers will be picked) to be sure.
Actions #17

Updated by hjluo 5 months ago

teset run Summary:

  • 20x: (4096 + 2048) -> (ALL PASSED)
  • 20x: (4096 + 3*1024) -> (2 out of 20 FAILED)
  • 20x: (4096 + 4*1024) -> (ALL PASSED)
Actions #18

Updated by JERiveraMoya 5 months ago

Let's setup that only in this devel group to 6144 then.
You can open a PR to remove the workaround but we merge it when we are sure it is not needed in other products.

Actions #19

Updated by hjluo 5 months ago

  • Changed group 116 ppc64le->agama_default:QEMURAM: '6144'.
  • PR_#20312 to remove previous PR#20236
Actions #20

Updated by JERiveraMoya 5 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF