Project

General

Profile

action #80452

[qe-core][qem] Problems with aarch64 RAID 15SP1/SP2 QU tests - **Suggested Backport**

Added by tjyrinki_suse 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

Passed for 15SP1 16 days ago:
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=45.41&groupid=249

Failed in a later build:
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=47.3&groupid=249

(same build, aarch64 specific for playground)
https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=47.3rerun1&groupid=249

But associated with change last Thursday like:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/commit/16631d3018a1cddae2e5825670d1a066e10fb015
that led to setup_libyui error:
https://openqa.suse.de/tests/5060842#step/setup_libyui/1
and with manual schedule omitting the setup_libyui to raid_gpt error:
https://openqa.suse.de/tests/5063512#step/raid_gpt/1

Then Rodion mentioned YAML schedule should not be used and moved back to non-YAML (even though the passing tests 16 days earlier were using the YAML schedule):
https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/commit/0a9c964daa754911478d01a29ffb4d9148c79fdf

But that lead to different errors, which in turn were partially fixed by:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/commit/96c4dca21c8bd7d9b11cffd58c2f9985ed319f53

Before now George started looking at, at least RAID 0, 5 and 10 were proven to have passed at least once for 15SP1 Build 47.3, while 1 and 6 remained problemtic.

For 15SP2, everything passed 5 days ago at https://openqa.suse.de/tests/overview?distri=sle&version=15-SP2&build=375.8&groupid=321 but similarly failures with latest build (rerun can get it further though): https://openqa.suse.de/tests/overview?distri=sle&version=15-SP2&build=376.1&groupid=321

SP2 is still using YAML.

History

#1 Updated by tjyrinki_suse 11 months ago

  • Description updated (diff)

#2 Updated by tjyrinki_suse 11 months ago

  • Description updated (diff)

#3 Updated by tjyrinki_suse 11 months ago

  • Priority changed from Normal to High

#4 Updated by tjyrinki_suse 11 months ago

  • Category set to Bugs in existing tests

#5 Updated by geor 10 months ago

  • Subject changed from [qe-core][qem] Problems with aarch64 RAID 15SP1/SP2 QU tests to [qe-core][qem] Problems with aarch64 RAID 15SP1/SP2 QU tests - **Suggested Backport**
  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Some reneedling was required; The aarch64 RAID jobs of 15SP1 and 15SP2 QUs should be passing consistently now.

Concerning the irregular reconnect_mgmt_console failure, this issue is actually caused in await_install.
The reboot message at the end of the installation has a default timeout of 10 seconds.
In some archs like aarch64 and s390, it happens that await_install module's needle check is not catching up with the 10 second timeout, and reboot is not cancelled.
This results in the machine rebooting when it should not, and failing in the next module, reconnect_mgmt_console.

In order to fix this issue, PR_1 and PR2 were introduced.
This allows for the following usage, as seen in lib/bootloader_setup.pm:
push @params, 'reboot_timeout=' . get_var('REBOOT_TIMEOUT', 0) unless (is_leap('<15.2') || is_sle('<15-SP2'));

The above line, by default, pushes in the list of bootparams the reboot_timeout=0 which, for 15-SP2 that contains the two aforementioned PR changes, removes the timeout on the reboot message and openQA will have time to catch up.

However, in 15-SP1 this boot parameter is not checked, so there is no straightforward way of changing or disabling the timeout.
The suggested approach here is to request a backport of this for yast in SLE 15-SP1.

Since it is not likely that there will ever be a new 15-SP1 QU release, the backport approach remains a suggestion for now.

Also available in: Atom PDF