action #167024
closedcoordination #163919: [epic] Create automation setup for testing Agama
Make stable agama in ppc64le by processing grub2 after installation finishes and reboot occurs
0%
Description
Motivation¶
Once the system is installed with agama, in slower architecture the screen is not drawn (fully or partially) on time and the 10 seconds that we have to interact with grub2 run out and the test fails sporadically, ie:
https://openqa.opensuse.org/tests/4486200#step/grub_test/2
With the openSUSE/SLES installer we solve this situation for automation by disabling the grub timeout in the UI https://openqa.opensuse.org/tests/4475892#step/disable_grub_timeout/20 but agama doesn't have this functionality (yet?).
In the same fashion that QE Yam create its own module to boot agama in quemu we could extract the code that is useful from method grub_test
in grub_utils.pm
and create a simplified version (we can call it agama_grub2.pm
), easier to maintain and which might include some temporary adjustments to avoid the problem described above.
A small experiment trying to intercept the black screen succeeded, this is the extract of the code:
assert_screen([qw(grub2 grub2-black-screen)], $timeout);
if (match_has_tag "grub2-black-screen") {
for (1 .. 9) {
send_key("up");
last if check_screen("grub2", 0);
sleep 0.5;
}
}
Acceptance criteria¶
- AC1: Make stable reaching the grub screen after agama installs the system in ppc64le.
- AC2: Solution should not be included in the shared code, but encapsulated in a new test module, not breaking any other jobs.
Updated by JERiveraMoya 6 months ago
- Subject changed from Make stable reaching the grub screen after agama installs the system in ppc64le to Make stable agama in ppc64le by processing grub2 after installation finishes and reboot occurs
Updated by JERiveraMoya 6 months ago
- Description updated (diff)
In this ticket I don't recommend to change existing code for grub_test, I find it really complicated to work with something like that where you will break most of the jobs in openQA if you didn't succeed. I will add a requirement that this needs to be encapsulated in a new module to not break anything.
Updated by JERiveraMoya 6 months ago
hjluo wrote in #note-6:
- Rerun 23.2 to verify.
I don't think we should run this in other architecture different than ppc64, I can see that the black screen match in x86_64 before the system is shutting down and in s390x will break after passing the installation. You need to create POMs similar to what is in boot_agama to handle this without using conditions.
Updated by JERiveraMoya 6 months ago · Edited
- Tags changed from qe-yam-sep-sprint to qe-yam-oct-sprint
Agreed to create new Page Object for grub page with couple of methods to check the screen is visible and enter the selected menu.
Should run only in ppc64le. Contact @jfernandez or @leli.
Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.
Updated by hjluo 6 months ago · Edited
Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.
- 10 times with double RAM for agama_default 9 out of 10 passed.so need we revert the previous PR? thanks
Updated by JERiveraMoya 6 months ago
hjluo wrote in #note-10:
Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.
- 10 times with double RAM for agama_default 9 out of 10 passed.so need we revert the previous PR? thanks
8GiB of RAM is quite a lot, there is probably some issue somewhere, but you did a good finding, if we experience the same in OSD we can file a bug, I'm not really sure about workers in openSUSE for power, also it is kind of old virtualization.
Might I ask you to try 20 runs with these two increments?
(1) 20x: (4096 + 1024)
(2) 20x: (4096 + 2048) (only if the one above doesn't work).
Finally to resolve the ticket we need to remove the workaround and also the schedule file schedule/yam/agama_ppc64le.yaml and adjust all agama job groups where is present (O3, OSD).
Updated by JERiveraMoya 6 months ago · Edited
JERiveraMoya wrote in #note-11:
hjluo wrote in #note-10:
Also check if bumping RAM (double?) and running 10x we always see the grub2, in this case there is not need for coding and we can revert current fix.
- 10 times with double RAM for agama_default 9 out of 10 passed.so need we revert the previous PR? thanks
8GiB of RAM is quite a lot (I didn't notice we already were on 4GB when I suggested to double it), TW can be installed with 4GB, so there is probably some issue somewhere, but you did a good finding, if we experience the same in OSD we can file a bug, I'm not really sure about workers in openSUSE for power, also it is kind of old virtualization.
Might I ask you to try 20 runs with these two increments to minimize the bumping?
(1) 20x: (4096 + 1024)
(2) 20x: (4096 + 2048) (only if the one above doesn't work).
Finally to resolve the ticket we need to remove the workaround and also the schedule file schedule/yam/agama_ppc64le.yaml and adjust all agama job groups where is present (O3, OSD).
Updated by hjluo 6 months ago · Edited
- (1) 20x: (4096 + 1024)
- 4096+1024_with_20_run
- result shows ->(5 out of 20 cases FAILED).
Updated by JERiveraMoya 6 months ago · Edited
hjluo wrote in #note-14:
- (2) 20x: (4096 + 2048) (only if the one above doesn't work)
keeps failing, let's try then:
- [**20x: (4096 + 3*1024)
- [**20x: (4096 + 4*1024) -> that is exactly double, what you tried but I would like to see it again with more runs (and more chances other workers will be picked) to be sure.
Updated by hjluo 6 months ago · Edited
JERiveraMoya wrote in #note-15:
hjluo wrote in #note-14:
- (2) 20x: (4096 + 2048) (only if the one above doesn't work)
- 20x: (4096 + 2048) run -> (ALL PASSED)
keeps failing, let's try then:
- [**20x: (4096 + 3*1024)
- [**20x: (4096 + 4*1024) -> that is exactly double, what you tried but I would like to see it again with more runs (and more chances other workers will be picked) to be sure.
- 20x: (4096 + 3*1024) -> (2 out of 20 FAILED)
- 20x: (4096 + 4*1024) -> (ALL PASSED)
Updated by JERiveraMoya 5 months ago
Let's setup that only in this devel group to 6144 then.
You can open a PR to remove the workaround but we merge it when we are sure it is not needed in other products.
Updated by JERiveraMoya 5 months ago
- Status changed from In Progress to Resolved