action #169249
opencoordination #154768: [saga][epic][ux] State-of-art user experience for openQA
coordination #157345: [epic] Improved test reviewer user experience
[sporadic] openqa_install_multimachine test fails in test_running - ping test fails auto_review:"Test died: command[\s\S]*openqa-cli api jobs" size:S
0%
Description
Observation¶
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install_multimachine@64bit-4G fails in
test_running
Reproducible¶
Fails since (at least) Build :TW.32491 (current job)
Expected result¶
Last good: :TW.32490 (or more recent)
Further details¶
Always latest result in this scenario: latest
Suggestions¶
- Ensure test_running-testresults.tar.gz contains image files and no (broken) symlinks
- Maybe make the contained needles visible in the "outer" job (e.g. https://openqa.opensuse.org/tests/4643016#step/tests/1 )
Updated by tinita about 2 months ago
- Subject changed from [sporadic] [openqa-in-openqa] test fails in test_running - ping test fails to [sporadic] [openqa-in-openqa] openqa_install_multimachine test fails in test_running - ping test fails
Updated by tinita about 2 months ago
- Related to action #169204: [sporadic] [openqa-in-openqa] openqa_install_multimachine test fails in test_running - taking too long until test is running size:S added
Updated by tinita about 1 month ago
I had a look at the logfiles, but was only be able to spot a failing needle match.
We are uploading the test results directory, however, the screenshots are symlinks to /var/lib/openqa/...
, so they are not part of the uploaded tarball, so we can't see how the screen looked like.
Maybe that can be improved.
We haven't seen the failure again so far.
Updated by okurz about 1 month ago
- Status changed from New to Resolved
- Assignee set to okurz
ok, thank you for looking into that. sometimes we just have to accept individual, spurious failures. Considering that otherwise https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-4G&test=openqa_install_multimachine&version=Tumbleweed#next_previous looks very green we should be ok to accept as is.
Updated by tinita about 1 month ago
- Status changed from Resolved to New
It happened again: http://openqa.opensuse.org/tests/4643020
Updated by livdywan about 1 month ago
- Subject changed from [sporadic] [openqa-in-openqa] openqa_install_multimachine test fails in test_running - ping test fails to openqa_install_multimachine test fails in test_running - ping test fails size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 1 month ago
- Subject changed from openqa_install_multimachine test fails in test_running - ping test fails size:S to [sporadic] openqa_install_multimachine test fails in test_running - ping test fails size:S
- Priority changed from High to Normal
sporadic but shouldn't bother us over hack week and this is now mostly about test code improvement anyway.
Updated by okurz 17 days ago
- Related to action #170296: [openqa-in-openqa][sporadic] test fails in test_running - ping_client is not complete size:S added
Updated by okurz 10 days ago
- Priority changed from Normal to Urgent
From today: https://openqa.opensuse.org/tests/4695731
Updated by tinita 9 days ago
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/217 Upload list of jobs for easier debugging
Updated by ybonatakis 9 days ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by openqa_review 8 days ago
- Due date set to 2024-12-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 8 days ago · Edited
- Due date deleted (
2024-12-27) - Status changed from In Progress to Resolved
No failures on the main jobs on O3
Then we have the cloned jobs logs including some screenshots. Although they are not helpful because they are not properly uploaded or something.
When I try to open them I see
Extraction of the entry:
‘testresults/00000/00000001-opensuse-Tumbleweed-DVD-x86_64-Build20241209-ping_server@64bit/boot_to_desktop-2.png’
failed with the error message:
Hard-link target 'testresults/00000/00000002-opensuse-Tumbleweed-DVD-x86_64-Build20241209-ping_client@64bit/boot_to_desktop-1.png' does not exist.
Do you want to continue extraction?
The retry is terminated after a few attempts. Not sure where this termination comes from tho.
I wonder if we set different retry params could change anything.
However I will resolve this for now, considering that we have #169249#note-14 and can provide some info next time
Updated by tinita 8 days ago
- Status changed from Resolved to Workable
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/217 will not bring any new information. It's just that so far we only had the screenshot of the non-pretty json output. Hard to spot the status, result, reason on that. The uploaded pretty-printed json will just make it a bit nicer.
Updated by tinita 8 days ago
ybonatakis wrote in #note-17:
The retry is terminated after a few attempts. Not sure where this termination comes from tho.
It is terminated because https://openqa.opensuse.org/tests/4695731#step/test_running/4 is waiting for a finished, passed job. When the job is failed or incomplete, the retry will stop and fail the test.
Updated by ybonatakis 5 days ago
@tinita the openqa_install_multimachine
doesnt fail the last 5 days (since 4695731). I suggest to resolve it for now unless you have any better idea. I dont think there is anything I can do actually
Updated by okurz 5 days ago
fail rate seems to be around 1/300 based on https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-4G&test=openqa_install_multimachine&version=Tumbleweed#next_previous
Updated by ybonatakis 5 days ago
- Subject changed from [sporadic] openqa_install_multimachine test fails in test_running - ping test fails size:S to [sporadic] openqa_install_multimachine test fails in test_running - ping test fails auto_review:"Test died: command.*retry" size:S
- Status changed from Workable to In Progress
add auto_review to prevent further notifications
Updated by ybonatakis 5 days ago
- Subject changed from [sporadic] openqa_install_multimachine test fails in test_running - ping test fails auto_review:"Test died: command.*retry" size:S to [sporadic] openqa_install_multimachine test fails in test_running - ping test fails auto_review:"Test died: command[\s\S]*openqa-cli api jobs" size:S
update the auto_review
Updated by openqa_review 4 days ago
- Due date set to 2024-12-31
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 4 days ago
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/218 for
okurz wrote in #note-26:
For the suggestion of the ticket
- Ensure test_running-testresults.tar.gz contains image files and no (broken) symlinks
from the help of tar
-h, --dereference follow symlinks; archive and dump the files they point to
that could help
Updated by livdywan 3 days ago
ybonatakis wrote in #note-27:
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/218 for
With this merged, I'd suggest run 300 jobs or so to reproduce and get a failure with the missing files. Hopefully that will yield a cue on the underlying issue.
Maybe make the contained needles visible in the "outer" job (e.g. https://openqa.opensuse.org/tests/4643016#step/tests/1 )
If this is tricky locally, feel free to run it on o3 or open platform
Updated by ybonatakis 2 days ago · Edited
livdywan wrote in #note-28:
ybonatakis wrote in #note-27:
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/218 for
With this merged, I'd suggest run 300 jobs or so to reproduce and get a failure with the missing files. Hopefully that will yield a cue on the underlying issue.
Maybe make the contained needles visible in the "outer" job (e.g. https://openqa.opensuse.org/tests/4643016#step/tests/1 )
If this is tricky locally, feel free to run it on o3 or open platform
Updated by livdywan about 24 hours ago · Edited
Looks like there's a bunch of failed jobs now that fail in start_test timing out in openqa-clone-job
, but it seems consistent:
# Test died: command 'retry -e -- openqa-clone-job --show-progress --skip-chained-deps --from http://openqa.opensuse.org $job_id' timed out at /usr/lib/os-autoinst/autotest.pm line 411.
Is this the same underlying issue? And if not I wonder why we only see this one now 🤔