action #120570
closed[qe-core][functional][tools] test fails in bootloader because root device is not ready and it leads to kernel panic size:M
0%
Description
Observation¶
openQA test in scenario sle-15-SP5-Online-ppc64le-textmode+role_textmode@ppc64le-hmc fails in
bootloader
Test suite description¶
Maintainers: QE Core, mgriessmeier
Like default but explicitly select the system role "textmode".
Reproducible¶
Fails since (at least) Build 40.1 (current job)
This seems to be sporadic issue, need to invesgate further.
Expected result¶
Last good: 38.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by openqa_review almost 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode+role_textmode@ppc64le-hmc
https://openqa.suse.de/tests/10028741#step/bootloader/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by zluo almost 2 years ago
- Status changed from New to In Progress
- Assignee set to zluo
take over and check
Updated by zluo almost 2 years ago
https://openqa.suse.de/tests/10028741#step/bootloader/24
looks like that initrd cannot be loaded, network issue for nfs mount to mnt directory?
Updated by zluo almost 2 years ago
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16029/ updated, still not covered all possible sporadic issues yet ;)
Updated by zluo almost 2 years ago
https://openqa.suse.de/tests/10106734#step/bootloader/45
this could be an acceptable case for re-trying loading.
Updated by zluo almost 2 years ago
I think we have to live with it for now:
https://openqa.suse.de/tests/10152427
Re-trying after reset_lpar_netboot still not working and hit timeout.
Updated by zluo almost 2 years ago
https://openqa.suse.de/tests/10164024#next_previous latest test runs after PR got updated for review.
Updated by zluo almost 2 years ago
- Related to action #122143: [qe-core][functional] test fails in bootloader because grub rescue mode entered due to network issue added
Updated by okurz over 1 year ago
- Status changed from Resolved to Feedback
Hi, this can't be resolved as long as there are soft-failure references to this ticket https://openqa.suse.de/tests/10375315#step/bootloader/25 so please make sure the according test code does not reference this or any other ticket in a soft-fail.
Updated by openqa_review over 1 year ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode+role_textmode@ppc64le-hmc
https://openqa.suse.de/tests/10562883#step/bootloader/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 68 days if nothing changes in this ticket.
Updated by zluo over 1 year ago
the real root cause I think is the directory issue on qanet. If network has problem (mounts directory is not working for example, then we have issue to load the initrd and linux kernel.
So I can remove the workaround for a test.
Updated by zluo over 1 year ago
https://progress.opensuse.org/issues/120570 this looks not good and it seems to be an new issue with network.
Updated by zluo over 1 year ago
zluo wrote:
https://progress.opensuse.org/issues/120570 this looks not good and it seems to be an new issue with network.
https://openqa.suse.de/tests/10807336 shows that grub menu data can be transferred and displayed. The network issue cannot be resolved by any workaround.
Updated by zluo over 1 year ago
https://openqa.suse.de/tests/10811737#next_previous shows some failure. This is for sure network issue at moment.
Updated by openqa_review over 1 year ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode+role_textmode@ppc64le-hmc
https://openqa.suse.de/tests/10924667#step/bootloader/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by zluo over 1 year ago
re-triggered and it looks good:
Updated by openqa_review over 1 year ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode+role_textmode@ppc64le-hmc
https://openqa.suse.de/tests/10940988#step/bootloader/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by mgrifalconi over 1 year ago
- Status changed from Feedback to Workable
Hello, don't see the reason to keep this ticket in feedback.
The record_softfailure will reopen the ticket automatically as Oliver said.
To resolve the ticket we should remove the softfailure mark in the code and just retry on failures or invest more on the problem and actually solve it.
There is no chance to fix from our side, this is clearly an issue on qanet and network issue.
Updated by zluo over 1 year ago
- Assignee changed from zluo to okurz
the root cause is on qanet and network issue happened sporadic. With my previous workaround(re-try, reset) it cannot be fixed.
So please help to fix the issue. I can remove softfail of course, then we go back to the problem as we had before.
Updated by zluo over 1 year ago
- Category changed from Bugs in existing tests to Infrastructure
Updated by okurz over 1 year ago
- Tags changed from bugbusters to bugbusters, infra
- Subject changed from [qe-core][functional] test fails in bootloader because root device is not ready and it leads to kernel panic to [qe-core][functional][tools] test fails in bootloader because root device is not ready and it leads to kernel panic
- Status changed from Workable to New
- Assignee deleted (
okurz) - Priority changed from Normal to High
- Target version changed from QE-Core: Ready to Ready
Updated by okurz over 1 year ago
- Project changed from openQA Tests to openQA Project
- Due date set to 2023-06-23
- Category changed from Infrastructure to Support
- Status changed from New to Feedback
- Assignee set to okurz
zluo wrote:
https://progress.opensuse.org/issues/120570 this looks not good and it seems to be an new issue with network.
- You are just referencing this ticket itself. Did you want to include another reference?
zluo wrote:
the root cause is on qanet and network issue happened sporadic. With my previous workaround(re-try, reset) it cannot be fixed.
So please help to fix the issue. I can remove softfail of course, then we go back to the problem as we had before.
- Could you please share a bit more context what you think the issue is?
Following the openQA test URL from #120570-19 in "Next & Previous" I find as latest job in this scenario failing with the same error symptoms
https://openqa.suse.de/tests/11162717
In https://openqa.suse.de/tests/11162717#step/bootloader/25 I can see the job loading initrd from the file path "mnt/openqa/repo/SLE-15-SP5-Online-ppc64le-Build102.1-Media1/boot/ppc64le/initrd. That's a path on qanet relative to /srv/tftp . The file /srv/tftp/mnt/openqa/repo/SLE-15-SP5-Online-ppc64le-Build102.1-Media1/boot/ppc64le/initrd exists and it is there right now. It's an "XZ compressed data" so I am pretty sure it is intact and it could be read in grub, otherwise grub would have reported a timeout reading or something. Also https://openqa.suse.de/tests/11176137#step/bootloader/25 on the same machine grenache-1:22 "redcurrant-2" passed and had no problems reading the same file
How to reproduce?
To keep the overview I suggest you update the ticket description according to the template https://progress.opensuse.org/projects/openqav3/wiki/#Further-decision-steps-working-on-test-issues and follow https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation to better understand the statistics
For further investigation I suggest to only schedule tests with the test module "bootloader" and with video enabled
Updated by livdywan over 1 year ago
- Subject changed from [qe-core][functional][tools] test fails in bootloader because root device is not ready and it leads to kernel panic to [qe-core][functional][tools] test fails in bootloader because root device is not ready and it leads to kernel panic size:M
Updated by okurz over 1 year ago
- Priority changed from High to Normal
reducing prio as there is apparently less interest from reporter.
Updated by okurz over 1 year ago
- Status changed from Feedback to Resolved
I assume the problem resolved itself because unfortunately there is no further response. I checked if there are any recent job labels using this ticket but openqa-query-for-job-label 120570
shows that we are good:
11162717|2023-05-19 15:17:50|done|failed|textmode+role_textmode||grenache-1
11146733|2023-05-17 03:26:25|done|failed|textmode+role_textmode||grenache-1
11140430|2023-05-16 15:01:14|done|failed|textmode+role_textmode||grenache-1
Updated by openqa_review about 1 year ago
- Status changed from Resolved to Feedback
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode+role_textmode@ppc64le-hmc
https://openqa.suse.de/tests/11162717#step/bootloader/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by okurz about 1 year ago
- Due date deleted (
2023-06-23) - Status changed from Feedback to Resolved
reminded rfan1 about the SLE15-SP6 setup in #131531 which is relevant here. That might be enough.