action #126068: [qe-core] Infinite grub timeout is not set for TW on o3 - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #126068

open

[qe-core] Infinite grub timeout is not set for TW on o3

Added by pcervinka almost 2 years ago. Updated 10 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Bugs in existing tests

Target version:

Start date:

2023-03-15

Due date:

% Done:

Estimated time:

Difficulty:

Description

Observation¶

openQA test in scenario opensuse-Tumbleweed-DVD-aarch64-ltp_dio@aarch64 fails in
boot_ltp.

NOTE: this effectively breaks any aarch64 kernel testing on o3 (install_ltp often fails and thus not any LTP test is run: https://openqa.opensuse.org/tests/3178342#next_previous).

Test suite description¶

LTP_ENV=TMPDIR=/var/tmp/ is to test on btrfs instead of tmpfs

Reproducible¶

Fails since (at least) Build 20230313 (current job)

Expected result¶

Last good: 20230308 (or more recent)

Further details¶

Always latest result in this scenario: latest

LTP tests sometimes fail during boot on aarch64(slowed backend). We figured out that GRUB_TIMEOUT=-1 is not set.
You can see it in https://openqa.opensuse.org/tests/3173645/file/install_ltp-grub which contains grub after installation.

Problem is in disable_grub_timeout during installation:
https://openqa.opensuse.org/tests/3172869#step/disable_grub_timeout/6

Timeout is untouched, you can also download video from installation and check it frame by frame.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by maritawerner almost 2 years ago

Project changed from openQA Tests (public) to 178
Category deleted (~~Bugs in existing tests~~)

Actions

Copy link

Updated by pcervinka almost 2 years ago

Project changed from 178 to openQA Tests (public)

This is installation issue, either core or yast, definitely not kernel. If it was, I would already assigned to our team already.

Actions

Copy link

Updated by pcervinka almost 2 years ago

Project changed from openQA Tests (public) to qe-yam

Actions

Copy link

Updated by pvorel almost 2 years ago

Description updated (diff)
Priority changed from Normal to High

Actions

Copy link

Updated by JERiveraMoya almost 2 years ago

I believe if the grub timeout is not set properly using needles it is due to recent changes by QE Core in SLE as we mainly use libyui-rest-api to interact with the installer.
Could you confirm @szarate? (pinging him in Slack)
Also this is task for Kernel to not depend on interactive installation although most likely they will face this bug for AutoYaST: https://bugzilla.suse.com/show_bug.cgi?id=1209083 There is in aarch64 autoyast_gnome and autoyast_minimalfor inspiration to create your testing prerequisites and help with this known technical debt in your own squad.

Actions

Copy link

Updated by JERiveraMoya almost 2 years ago

Project changed from qe-yam to openQA Tests (public)
Subject changed from Infinite grub timeout is not set for TW on o3 to [qe-core] Infinite grub timeout is not set for TW on o3

Actions

Copy link

Updated by openqa_review almost 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: create_hdd_xfstests
https://openqa.opensuse.org/tests/3238168#step/boot_to_desktop/1

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Copy link

Updated by pcervinka almost 2 years ago

JERiveraMoya wrote:

Also this is task for Kernel to not depend on interactive installation although most likely they will face this bug for AutoYaST: https://bugzilla.suse.com/show_bug.cgi?id=1209083 There is in aarch64 autoyast_gnome and autoyast_minimalfor inspiration to create your testing prerequisites and help with this known technical debt in your own squad.

To be honest, we already tried autoyast in the past(2 years ago) for baremetal installations, but due to autoyast bug (which took long time to fix) we reverted to classical needle way, which usually needs only fixing at test level (like this case, it is other question, why it is taking so long). Even your feedback contains reference to AY bug. Autoyast can blow up, especially if you need to maintain multiple architectures. Moreover, each baremetal server even for x86_64 requires specific profile, due to hardware difference. All is sorted in needle based installation and you usually need to maintain one needle flow for all archs and baremetals.

I would agree with you, if was autoyast stable. Keep in mind that we are supposed to test kernel and not play games with installation, so please don't mention technical debt, it's not fair. This is why we usually reuse images produced by other jobs.

Actions

Copy link

Updated by JERiveraMoya almost 2 years ago

pcervinka wrote:

JERiveraMoya wrote:

Also this is task for Kernel to not depend on interactive installation although most likely they will face this bug for AutoYaST: https://bugzilla.suse.com/show_bug.cgi?id=1209083 There is in aarch64 autoyast_gnome and autoyast_minimalfor inspiration to create your testing prerequisites and help with this known technical debt in your own squad.

To be honest, we already tried autoyast in the past(2 years ago) for baremetal installations, but due to autoyast bug (which took long time to fix) we reverted to classical needle way, which usually needs only fixing at test level (like this case, it is other question, why it is taking so long). Even your feedback contains reference to AY bug. Autoyast can blow up, especially if you need to maintain multiple architectures. Moreover, each baremetal server even for x86_64 requires specific profile, due to hardware difference. All is sorted in needle based installation and you usually need to maintain one needle flow for all archs and baremetals.

I would agree with you, if was autoyast stable. Keep in mind that we are supposed to test kernel and not play games with installation, so please don't mention technical debt, it's not fair. This is why we usually reuse images produced by other jobs.

Yes, I understand, sometimes a bug can make you chose another path to test, but I don't have that impression that AutoYaST is not stable, but has bugs like any other components, if other squad would have moved to AutoYaST in the past there will be more bug fixed, that one I mentioned surprised me that existed for so long time, it was only clear that happened in some migrations and only in aarch64 so didn't get enough attention to be fixed, I'm pretty sure that it would have blocked kernel test it would have be fixed by now, just the RM needs to know how bad it is.

You mentioned something that is true that you need to maintain several files for several architectures, but the maintenance is not the same than the code maintenance, it is just configuration in some structured xml, so the complexity is much less than maintain the needle&shortcuts code base, we have sometimes several times similar files but they are untouched for long time, the opposite happens with the one-flow-with-needles because there is a lot of spaguetti code multiproduct/multi-arch/multi-taste-of-tester,etc, that you fix something and you break another thing.

From long time ago in Yam squad we don't use for installation needles and shortcuts (we actually becomes less efficient fixing those needles for last year to be honest because we didn't need to use testapi for that) we use the only tool that allows us to maintain the code properly which is the libyui-rest-api (which avoid those changes in shortcuts and more issues) and if we use some needles somewhere is because we try to find the time to migrate to this technology but we are not yet. For that reason I can see that QE-Core is helping in those cases, historically they have in their job group the image generation for other squads to use, we cannot have that, we cannot afford to maintain those dependencies for other squads due to capacity. But QE-Core is also migrating to AutoYaST all that they can as far as i know.

By the way we have been trying to help QE-Kernel with AutoYaST in O3 for s390x zVM, at least once month of trying many things and all kind of stuff but it doesn't work due to the infrastructure, but this is not this case, this is a classic quemu one.

Said that and looking at how we could possible help if this doesn't go forward so kernel test can be unblocked, I've just checked what Richard Fan was doing and seems that there is a AutoYaST test suite which should be the same than the one interactive failing: https://openqa.opensuse.org/tests/3247995 I guess QE Kernel could use that one asking first Richard if it is ok in order to unblock this issue.
Let me know if something doesn't work and we could also help there.

Actions

Copy link

#10

Updated by pcervinka almost 2 years ago

I don't understand where are you heading and suggesting us to use AY for jobs which we are not primarily responsible. We shouldn't to be responsible for generic image preparation jobs(unless it is on our specific hardware, or is it related to very specific configuration). We always reused existing images, like other teams. If someones decides to fix reported problem, I don't mind how they will do it. If is solution to use AY, why not, we will just update job dependencies. Or something changed at some point and each team should create own qcow2 image preparation jobs?

Actions

Copy link

#11

Updated by JERiveraMoya almost 2 years ago

pcervinka wrote:

I don't understand where are you heading and suggesting us to use AY for jobs which we are not primarily responsible. We shouldn't to be responsible for generic image preparation jobs(unless it is on our specific hardware, or is it related to very specific configuration). We always reused existing images, like other teams. If someones decides to fix reported problem, I don't mind how they will do it. If is solution to use AY, why not, we will just update job dependencies. Or something changed at some point and each team should create own qcow2 image preparation jobs?

There is more than one year that we are sharing that info in Weekly Sync, I also think we have talked about in the past in Slack, the idea is that each squad is more and more independent and take their choices having their own images if they run something in the installed system.
You need to check with QE-Core (Richard's squad) for those generic images, as they are fine with maintaining them, but Yam squad just need to provide the test suite for avoid regression and check that AutoYaST is fine in basic scenarios (but not link other squads's test suite with that one).

In other words, it is fine that there would be two test suites (jobs) doing the same, we might waste a few resources but the management is more clean so each squad have their own stuff in their job group (it is not the case for openSUSE, ok, but we can check the maintainer), and editing few xmls, filing some bugs by their own for autoyast or even contacting Yam squad to help or cc in the bug is the way to go. In that way we can also all squad be more fair with each other regarding existing technical debt in the code.

Actions

Copy link

#12

Updated by pcervinka almost 2 years ago

@JERiveraMoya let's follow with discussion next week on already scheduled call. Thank you.

Actions

Copy link

#13

Updated by szarate almost 2 years ago

Related to action #128339: [qe-core][functional] children jobs use the qcow2 image which created via autoyast added

Actions

Copy link

#14

Updated by openqa_review almost 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: install_ltp+opensuse+DVD
https://openqa.opensuse.org/tests/3311396#step/install_ltp/1

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Copy link

#15

Updated by okurz over 1 year ago

Category set to Bugs in existing tests

Actions

Copy link

#16

Updated by slo-gin 12 months ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions

Copy link

#17

Updated by slo-gin 10 months ago

Priority changed from High to Normal

The ticket will be set to the next lower priority Normal

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Tests (public)

Tags

Custom queries

action #126068

[qe-core] Infinite grub timeout is not set for TW on o3

Observation¶

Test suite description¶

Reproducible¶

Expected result¶

Further details¶

Updated by maritawerner almost 2 years ago

Updated by pcervinka almost 2 years ago

Updated by pcervinka almost 2 years ago

Updated by pvorel almost 2 years ago

Updated by JERiveraMoya almost 2 years ago

Updated by JERiveraMoya almost 2 years ago

Updated by openqa_review almost 2 years ago

Updated by pcervinka almost 2 years ago

Updated by JERiveraMoya almost 2 years ago

Updated by pcervinka almost 2 years ago

Updated by JERiveraMoya almost 2 years ago

Updated by pcervinka almost 2 years ago

Updated by szarate almost 2 years ago

Updated by openqa_review almost 2 years ago

Updated by okurz over 1 year ago

Updated by slo-gin 12 months ago

Updated by slo-gin 10 months ago