Project

General

Profile

Actions

action #126068

open

[qe-core] Infinite grub timeout is not set for TW on o3

Added by pcervinka about 1 year ago. Updated 9 months ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2023-03-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-aarch64-ltp_dio@aarch64 fails in
boot_ltp.

NOTE: this effectively breaks any aarch64 kernel testing on o3 (install_ltp often fails and thus not any LTP test is run: https://openqa.opensuse.org/tests/3178342#next_previous).

Test suite description

LTP_ENV=TMPDIR=/var/tmp/ is to test on btrfs instead of tmpfs

Reproducible

Fails since (at least) Build 20230313 (current job)

Expected result

Last good: 20230308 (or more recent)

Further details

Always latest result in this scenario: latest

LTP tests sometimes fail during boot on aarch64(slowed backend). We figured out that GRUB_TIMEOUT=-1 is not set.
You can see it in https://openqa.opensuse.org/tests/3173645/file/install_ltp-grub which contains grub after installation.

Problem is in disable_grub_timeout during installation:
https://openqa.opensuse.org/tests/3172869#step/disable_grub_timeout/6

Timeout is untouched, you can also download video from installation and check it frame by frame.


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #128339: [qe-core][functional] children jobs use the qcow2 image which created via autoyastResolvedrfan12023-04-11

Actions
Actions #1

Updated by maritawerner about 1 year ago

  • Project changed from openQA Tests to 178
  • Category deleted (Bugs in existing tests)
Actions #2

Updated by pcervinka about 1 year ago

  • Project changed from 178 to openQA Tests

This is installation issue, either core or yast, definitely not kernel. If it was, I would already assigned to our team already.

Actions #3

Updated by pcervinka about 1 year ago

  • Project changed from openQA Tests to qe-yam
Actions #4

Updated by pvorel about 1 year ago

  • Description updated (diff)
  • Priority changed from Normal to High
Actions #5

Updated by JERiveraMoya 12 months ago

I believe if the grub timeout is not set properly using needles it is due to recent changes by QE Core in SLE as we mainly use libyui-rest-api to interact with the installer.
Could you confirm @szarate? (pinging him in Slack)
Also this is task for Kernel to not depend on interactive installation although most likely they will face this bug for AutoYaST: https://bugzilla.suse.com/show_bug.cgi?id=1209083 There is in aarch64 autoyast_gnome and autoyast_minimalfor inspiration to create your testing prerequisites and help with this known technical debt in your own squad.

Actions #6

Updated by JERiveraMoya 12 months ago

  • Project changed from qe-yam to openQA Tests
  • Subject changed from Infinite grub timeout is not set for TW on o3 to [qe-core] Infinite grub timeout is not set for TW on o3
Actions #7

Updated by openqa_review 11 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: create_hdd_xfstests
https://openqa.opensuse.org/tests/3238168#step/boot_to_desktop/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #8

Updated by pcervinka 11 months ago

JERiveraMoya wrote:

Also this is task for Kernel to not depend on interactive installation although most likely they will face this bug for AutoYaST: https://bugzilla.suse.com/show_bug.cgi?id=1209083 There is in aarch64 autoyast_gnome and autoyast_minimalfor inspiration to create your testing prerequisites and help with this known technical debt in your own squad.

To be honest, we already tried autoyast in the past(2 years ago) for baremetal installations, but due to autoyast bug (which took long time to fix) we reverted to classical needle way, which usually needs only fixing at test level (like this case, it is other question, why it is taking so long). Even your feedback contains reference to AY bug. Autoyast can blow up, especially if you need to maintain multiple architectures. Moreover, each baremetal server even for x86_64 requires specific profile, due to hardware difference. All is sorted in needle based installation and you usually need to maintain one needle flow for all archs and baremetals.

I would agree with you, if was autoyast stable. Keep in mind that we are supposed to test kernel and not play games with installation, so please don't mention technical debt, it's not fair. This is why we usually reuse images produced by other jobs.

Actions #9

Updated by JERiveraMoya 11 months ago

pcervinka wrote:

JERiveraMoya wrote:

Also this is task for Kernel to not depend on interactive installation although most likely they will face this bug for AutoYaST: https://bugzilla.suse.com/show_bug.cgi?id=1209083 There is in aarch64 autoyast_gnome and autoyast_minimalfor inspiration to create your testing prerequisites and help with this known technical debt in your own squad.

To be honest, we already tried autoyast in the past(2 years ago) for baremetal installations, but due to autoyast bug (which took long time to fix) we reverted to classical needle way, which usually needs only fixing at test level (like this case, it is other question, why it is taking so long). Even your feedback contains reference to AY bug. Autoyast can blow up, especially if you need to maintain multiple architectures. Moreover, each baremetal server even for x86_64 requires specific profile, due to hardware difference. All is sorted in needle based installation and you usually need to maintain one needle flow for all archs and baremetals.

I would agree with you, if was autoyast stable. Keep in mind that we are supposed to test kernel and not play games with installation, so please don't mention technical debt, it's not fair. This is why we usually reuse images produced by other jobs.

Yes, I understand, sometimes a bug can make you chose another path to test, but I don't have that impression that AutoYaST is not stable, but has bugs like any other components, if other squad would have moved to AutoYaST in the past there will be more bug fixed, that one I mentioned surprised me that existed for so long time, it was only clear that happened in some migrations and only in aarch64 so didn't get enough attention to be fixed, I'm pretty sure that it would have blocked kernel test it would have be fixed by now, just the RM needs to know how bad it is.

You mentioned something that is true that you need to maintain several files for several architectures, but the maintenance is not the same than the code maintenance, it is just configuration in some structured xml, so the complexity is much less than maintain the needle&shortcuts code base, we have sometimes several times similar files but they are untouched for long time, the opposite happens with the one-flow-with-needles because there is a lot of spaguetti code multiproduct/multi-arch/multi-taste-of-tester,etc, that you fix something and you break another thing.

From long time ago in Yam squad we don't use for installation needles and shortcuts (we actually becomes less efficient fixing those needles for last year to be honest because we didn't need to use testapi for that) we use the only tool that allows us to maintain the code properly which is the libyui-rest-api (which avoid those changes in shortcuts and more issues) and if we use some needles somewhere is because we try to find the time to migrate to this technology but we are not yet. For that reason I can see that QE-Core is helping in those cases, historically they have in their job group the image generation for other squads to use, we cannot have that, we cannot afford to maintain those dependencies for other squads due to capacity. But QE-Core is also migrating to AutoYaST all that they can as far as i know.

By the way we have been trying to help QE-Kernel with AutoYaST in O3 for s390x zVM, at least once month of trying many things and all kind of stuff but it doesn't work due to the infrastructure, but this is not this case, this is a classic quemu one.

Said that and looking at how we could possible help if this doesn't go forward so kernel test can be unblocked, I've just checked what Richard Fan was doing and seems that there is a AutoYaST test suite which should be the same than the one interactive failing: https://openqa.opensuse.org/tests/3247995 I guess QE Kernel could use that one asking first Richard if it is ok in order to unblock this issue.
Let me know if something doesn't work and we could also help there.

Actions #10

Updated by pcervinka 11 months ago

I don't understand where are you heading and suggesting us to use AY for jobs which we are not primarily responsible. We shouldn't to be responsible for generic image preparation jobs(unless it is on our specific hardware, or is it related to very specific configuration). We always reused existing images, like other teams. If someones decides to fix reported problem, I don't mind how they will do it. If is solution to use AY, why not, we will just update job dependencies. Or something changed at some point and each team should create own qcow2 image preparation jobs?

Actions #11

Updated by JERiveraMoya 11 months ago

pcervinka wrote:

I don't understand where are you heading and suggesting us to use AY for jobs which we are not primarily responsible. We shouldn't to be responsible for generic image preparation jobs(unless it is on our specific hardware, or is it related to very specific configuration). We always reused existing images, like other teams. If someones decides to fix reported problem, I don't mind how they will do it. If is solution to use AY, why not, we will just update job dependencies. Or something changed at some point and each team should create own qcow2 image preparation jobs?

There is more than one year that we are sharing that info in Weekly Sync, I also think we have talked about in the past in Slack, the idea is that each squad is more and more independent and take their choices having their own images if they run something in the installed system.
You need to check with QE-Core (Richard's squad) for those generic images, as they are fine with maintaining them, but Yam squad just need to provide the test suite for avoid regression and check that AutoYaST is fine in basic scenarios (but not link other squads's test suite with that one).

In other words, it is fine that there would be two test suites (jobs) doing the same, we might waste a few resources but the management is more clean so each squad have their own stuff in their job group (it is not the case for openSUSE, ok, but we can check the maintainer), and editing few xmls, filing some bugs by their own for autoyast or even contacting Yam squad to help or cc in the bug is the way to go. In that way we can also all squad be more fair with each other regarding existing technical debt in the code.

Actions #12

Updated by pcervinka 11 months ago

@JERiveraMoya let's follow with discussion next week on already scheduled call. Thank you.

Actions #13

Updated by szarate 11 months ago

  • Related to action #128339: [qe-core][functional] children jobs use the qcow2 image which created via autoyast added
Actions #14

Updated by openqa_review 10 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: install_ltp+opensuse+DVD
https://openqa.opensuse.org/tests/3311396#step/install_ltp/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #15

Updated by okurz 9 months ago

  • Category set to Bugs in existing tests
Actions

Also available in: Atom PDF