action #43880
closed[functional][u][s390x][sporadic] test fails in shutdown on s390x
Added by oorlov about 6 years ago. Updated over 5 years ago.
0%
Description
Observation¶
openQA test in scenario sle-15-SP1-Installer-DVD-s390x-create_hdd_gnome@s390x-kvm-sle12 fails in
shutdown
Reproducible¶
Fails since (at least) Build 96.7 (current job)
Expected result¶
Last good: 95.1 (or more recent)
Further details¶
- Other job runs failed with the same issue: https://openqa.suse.de/tests/2262190#step/shutdown/8, https://openqa.suse.de/tests/2262169#step/shutdown/9
- Always latest result in this scenario: latest
Updated by okurz about 6 years ago
- Status changed from New to Workable
- Priority changed from Normal to High
hm, there is check_shutdown(60)
which again is not that long. I recommend to bump the timeout and crosscheck with the code branches which might already wait longer.
Updated by mgriessmeier about 6 years ago
- Status changed from Workable to In Progress
- Assignee set to mgriessmeier
I will take a look into this,
though the 60s seems to have worked in the past - might be different issue - let's investigate
Updated by okurz about 6 years ago
try SCHEDULE=tests/boot/boot_to_desktop,tests/shutdown/shutdown
, or add in "bootloader_zkvm" in there.
Updated by mgriessmeier about 6 years ago
okurz wrote:
try
SCHEDULE=tests/boot/boot_to_desktop,tests/shutdown/shutdown
, or add in "bootloader_zkvm" in there.
problem is that the only fails in shutdown I can find are on create_hdd jobs - so we might miss the issue if we just use a qcow image
so I will first schedule the whole job once and grab the image from there
Updated by okurz about 6 years ago
well, I thought the "boot+shutdown" test would be super-fast to finish so you can schedule 100 runs easily on production and reject the hypothesis that it can also appear in the image-booting scenarios.
Updated by mgriessmeier about 6 years ago
triggered 50 jobs with custom scheduled - so far 2/11 failed =)
http://opeth.suse.de/tests/overview?build=107.5&distri=sle&version=15-SP1
Updated by mgriessmeier about 6 years ago
mgriessmeier wrote:
triggered 50 jobs with custom scheduled - so far 2/11 failed =)
http://opeth.suse.de/tests/overview?build=107.5&distri=sle&version=15-SP1
so 8 out of 50 failed, I will investigate those and trigger 50 more with increased timeout to see if the failure rate is reduced or completely gone
Updated by mgriessmeier about 6 years ago
ok, with double timeout it's 1 fail out of 50...
will go with *3 which we also have for other cases...
Updated by okurz about 6 years ago
- Related to coordination #35215: [functional][u][epic][medium] test fails on shutdown module added
Updated by okurz about 6 years ago
- Related to action #43064: [functional][u] test fails in boot_into_snapshot and reboot_gnome with encrypted setup - test not waiting long enough for shutdown/reboot until we end up in grub which shows in post_fail_hook added
Updated by okurz about 6 years ago
- Related to action #43616: [functional][u][sporadic] test fails in shutdown - SUT took longer than 60 seconds to shutdown, no logs available (non-s390x) added
Updated by okurz about 6 years ago
- Related to action #41183: [functional][u] soft-fail in shutdown should have valid bugref added
Updated by okurz about 6 years ago
- Related to action #42038: [sle][functional[u] test fails in shutdown - add post_fail_hook for shutdown module if possible added
Updated by okurz about 6 years ago
- Related to action #38108: [openqa][kernel] Power down needs to have longer timeout due longer shutdown of BTRFS related service added
Updated by okurz about 6 years ago
- Related to action #35892: [qe-core][functional][hard] test fails in kdump_and_crash - improve bootup/shutdown debugging approach added
Updated by okurz about 6 years ago
custom scheduling rocks, right? ;)
Good evaluation. Please see the related issues as well. We should on one hand fix the test failing which we can do by bumping the timeout but also we should provide bug reports or help with investigation on already existing ones. The approach you found might be especially helpful for that.
@oorlov, as the "shutdown specialist", what suggestions can you give how to investigate the issue further or how to track according bugs with soft-fails?
@mgriessmeier maybe just find corresponding bugs about "shutdown takes too long", comment in there with suggestions how to investigate, e.g. point to the openqa-clone-job commands you used to reproduce easily.
Updated by mgriessmeier about 6 years ago
- Priority changed from High to Normal
triple timeout also fails 2 out of 50 - but since it is improving the situation significantly, here is the PR:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6374
reducing urgency therefore and continuing on suggestions from okurz
Updated by mgriessmeier about 6 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
mgriessmeier)
Urgency was removed - not working on the followup atm
Updated by okurz almost 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: create_hdd_textmode@zkvm
https://openqa.suse.de/tests/2347165
Updated by okurz almost 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: create_hdd_gnome
https://openqa.suse.de/tests/2369949
Updated by zluo almost 6 years ago
My PR (https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6702 , see #42038 for details) might be helpful to check the cause of shutdown issue.
Updated by szarate almost 6 years ago
I wonder if setting the $soft_fail_data hash to have a hard timeout of 180 and a soft timeout of 60 would work in this case, today we got 3 failures so far, most of them in the order of 2 minutes between the moment the poweroff command is sent and the moment where virsh actually reports that the SUT is actually shut off...
Updated by okurz almost 6 years ago
- Target version changed from Milestone 22 to Milestone 23
Updated by SLindoMansilla almost 6 years ago
- Subject changed from [functional][u][s390x] test fails in shutdown on s390x to [functional][u][s390x][sporadic] test fails in shutdown on s390x
New occurrence on OSD: https://openqa.suse.de/tests/2496334
It seems to be sporadic.
Updated by zluo almost 6 years ago
- Assignee set to zluo
I filed yesterday ticket #48545, maybe this is related issue, let me check this and give an update on this.
Updated by zluo almost 6 years ago
http://openqa.suse.de/tests/2508379#next_previous
100 test runs triggered now
Updated by zluo almost 6 years ago
WIP PR:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6940
with extend timeout for check_shutdown:
Created job #2508629: sle-15-SP1-Installer-DVD-s390x-Build178.3-create_hdd_gnome@s390x-kvm-sle12 -> https://openqa.suse.de/t2508629
Updated by okurz almost 6 years ago
- Priority changed from Normal to High
back to "High" as still or again linked to currently failing tests.
Updated by okurz almost 6 years ago
- Related to action #46076: [sle][functional][u][medium] test fails in shutdown on minimalx added
Updated by zluo almost 6 years ago
my WIP PR which got tested on osd https://openqa.suse.de/t2508629 shows no issue and on my local server I tested:
http://f40.suse.de/tests/1526#next_previous shows all successful test runs, 100 total, a couple of tests failed at very beginning bootloader_zkvm which is a different issue.
Updated by zluo almost 6 years ago
Updated by zluo almost 6 years ago
- Status changed from In Progress to Feedback
waiting for merging PR.
Updated by okurz almost 6 years ago
merged and retriggered all three s390x shutdown failures in https://openqa.suse.de/tests/overview?result=none&result=failed&result=incomplete&result=skipped&result=obsoleted&result=parallel_failed&result=parallel_restarted&result=user_cancelled&result=user_restarted&arch=&modules=&distri=sle&version=15-SP1&build=190.3&groupid=112&groupid=110#
Updated by SLindoMansilla over 5 years ago
I don't know how to interpret the link from okurz in https://progress.opensuse.org/issues/43880#note-36
Waiting for verification run on OSD: https://openqa.suse.de/tests/overview?build=poo43880_osd_verification
Updated by zluo over 5 years ago
- Status changed from Feedback to Resolved
https://openqa.suse.de/tests/2725859#step/shutdown/8 shows successful test run.
Updated by SLindoMansilla over 5 years ago
- Status changed from Resolved to Feedback
That is not enough, because we need to verify an sporadic bug.
We need to verify with 100 jobs.
Waiting for: https://openqa.suse.de/tests/overview?build=poo43880_osd_verification
Please, feel free to assign the ticket to me if you don't want to have it, so I take care of updating the status when the jobs finish.
Updated by zluo over 5 years ago
I checked this already over 100 times: http://f40.suse.de/tests/1526#next_previous
please see my previous comment, and this has been already tested on osd with WIP PR.
I think this is enough, but if you want to test extra, please go head...
Updated by zluo over 5 years ago
- Status changed from Feedback to Resolved
verification tests on osd look good. Resolved.