action #166445
openopenQA Project - coordination #127031: [saga][epic] openQA for SUSE customers
openQA Project - coordination #165393: [epic] Improved code coverage in openQA
[openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:M
0%
Description
Observation¶
The inner test is incomplete despite the only step simple_boot is passed.
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_bootstrap@64bit-2G fails in
tests
autoinst-log.txt for the inner tests shows:
[2024-09-11T07:00:30.593688-04:00] [debug] [pid:51558] [autotest] process exited: 0
…
[2024-09-11T07:00:30.705630-04:00] [debug] [pid:51578] backend got TERM
[2024-09-11T07:00:30.705933-04:00] [info] [pid:51578] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-11T07:00:31.758537-04:00] [debug] [pid:51578] Passing remaining frames to the video encoder
[2024-09-11T07:00:31.794569-04:00] [debug] [pid:51578] Waiting for video encoder to finalize the video
[2024-09-11T07:00:31.794634-04:00] [debug] [pid:51578] The built-in video encoder (pid 51580) terminated
[2024-09-11T07:00:31.794967-04:00] [debug] [pid:51578] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 (/usr/bin/isotovideo: backend)
[2024-09-11T07:00:31.795581-04:00] [debug] [pid:51578] sending magic and exit
[2024-09-11T07:00:31.908863-04:00] [debug] [pid:51558] done with backend process
51558: EXIT 1
[2024-09-11T07:00:31.927014-04:00] [info] Isotovideo exit status: 1
[2024-09-11T07:00:31.965003-04:00] [info] +++ worker notes +++
[2024-09-11T07:00:31.965099-04:00] [info] End time: 2024-09-11 11:00:31
[2024-09-11T07:00:31.965147-04:00] [info] Result: died
[2024-09-11T07:00:31.974998-04:00] [info] Uploading video.ogv
[2024-09-11T07:00:32.010558-04:00] [info] Uploading autoinst-log.txt
also from the worker
[2024-09-11T18:03:44.887585Z] [error] REST-API error (POST https://openqa.opensuse.org/api/v1/jobs/4472003/status): Connection error: Premature connection close (remaining tries: 59)
Reproducible¶
Fails since (at least) Build :TW.31137 (current job). Judging from https://openqa.opensuse.org/tests/4471721#next_previous it seems the job fails the same at least 1/100 times.
Expected result¶
Last good: :TW.31136 (or more recent)
Suggestions¶
- Take a look into the inner job's autoinst-log.txt and also consider to upload the worker-log.txt
- Consider compressing and uploading the whole results directory
- Look into the error message visible in https://openqa.opensuse.org/tests/4471721/logfile?filename=tests-journal.log.txt#line-5968
Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK* log.c: 92: core_log_lib_info: src version: 2.1.0
Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK* log.c: 94: core_log_lib_info: compiled with support for shutdown state
Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK* log.c: 97: core_log_lib_info: compiled with libndctl 63+
Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK* log.c: 92: core_log_lib_info: src version: 2.1.0
Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK* log.c: 94: core_log_lib_info: compiled with support for shutdown state
Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK* log.c: 97: core_log_lib_info: compiled with libndctl 63+
Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK* log.c: 92: core_log_lib_info: src version: 2.1.0
Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK* log.c: 94: core_log_lib_info: compiled with support for shutdown state
Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK* log.c: 97: core_log_lib_info: compiled with libndctl 63+Sep 11 10:04:18 susetest worker[124339]: [info] 124339: WORKING 1
Further details¶
Always latest result in this scenario: latest
Updated by jbaier_cz 2 months ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-(test-details|search-results).*matched" to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-test-details.*matched"
Updated by jbaier_cz 2 months ago
- Related to action #165692: [openQA-in-openQA] test does not fail if state is done but incomplete auto_review:"no candidate needle.*openqa-dashboard.*matched" size:S added
Updated by jbaier_cz 2 months ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-test-details.*matched" to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry
Updated by tinita 2 months ago
Just a small PR so we have the actual logfile for investigation: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/206 Correct uploading of autoinst-log.txt
Updated by jbaier_cz 2 months ago
Example of failure with the autoinst-log.txt attached: https://openqa.opensuse.org/tests/4471721
Updated by okurz 2 months ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by ybonatakis 2 months ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by openqa_review 2 months ago
- Due date set to 2024-09-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 2 months ago
- Due date deleted (
2024-09-27)
Updated by openqa_review 2 months ago
- Due date set to 2024-10-01
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 2 months ago
- Status changed from In Progress to Feedback
baseclass calls stop_vm the qemu error comes between $self->_stop_video_encoder();
and $self->close_pipes();
messages.
However before we terminate test runner and isotovideo. The error code is assigned when $runner->testfd
. the code is commented with unusual shutdown
Updated by ybonatakis 2 months ago
Updated by ybonatakis 2 months ago · Edited
Updated by ybonatakis 2 months ago
ybonatakis wrote in #note-15:
I set up a virtual machine and I compared the logs:
{{collapse(failed job logs)
[1;34m[2024-09-16T09:38:21.788738-04:00] [debug] [pid:124359] ||| finished boot tests (runtime: 11 s)[0m
[37m[2024-09-16T09:38:21.793705-04:00] [debug] [pid:124339] [autotest] process exited: 0[0m
[37m[2024-09-16T09:38:21.794268-04:00] [debug] [pid:124339] stopping command server 124356 because test execution ended[0m
[37m[2024-09-16T09:38:21.794336-04:00] [debug] [pid:124339] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/IUjZf4snrxneedUn/broadcast[0m
[37m[2024-09-16T09:38:21.821683-04:00] [debug] [pid:124339] commands process exited: 0[0m
[37m[2024-09-16T09:38:21.921957-04:00] [debug] [pid:124339] done with command server[0m
[37m[2024-09-16T09:38:21.922056-04:00] [debug] [pid:124339] stopping autotest process 124359[0m
[37m[2024-09-16T09:38:21.922138-04:00] [debug] [pid:124339] done with autotest process[0m
[37m[2024-09-16T09:38:21.922191-04:00] [debug] [pid:124339] isotovideo failed[0m
[37m[2024-09-16T09:38:21.923372-04:00] [debug] [pid:124339] stopping backend process 124360[0m
[37m[2024-09-16T09:38:21.923566-04:00] [debug] [pid:124360] backend got TERM[0m
[33m[2024-09-16T09:38:21.924008-04:00] [info] [pid:124360] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json[0m
[37m[2024-09-16T09:38:22.978555-04:00] [debug] [pid:124360] Passing remaining frames to the video encoder[0m
[out#0/webm @ 0x555a322309c0] video:20KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 3.039424%
frame= 24 fps=2.0 q=50.0 Lsize= 20KiB time=00:00:00.95 bitrate= 174.3kbits/s speed=0.0779x
[37m[2024-09-16T09:38:24.081336-04:00] [debug] [pid:124360] Waiting for video encoder to finalize the video[0m
[37m[2024-09-16T09:38:24.081465-04:00] [debug] [pid:124360] The external video encoder (pid 124362) terminated[0m
[37m[2024-09-16T09:38:24.081668-04:00] [debug] [pid:124360] The built-in video encoder (pid 124363) terminated[0m
[37m[2024-09-16T09:38:24.082421-04:00] [debug] [pid:124360] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124360 (/usr/bin/isotovideo: backend)[0m
[37m[2024-09-16T09:38:24.083632-04:00] [debug] [pid:124360] sending magic and exit[0m
[37m[2024-09-16T09:38:24.228532-04:00] [debug] [pid:124339] done with backend process[0m
124339: EXIT 1
[2024-09-16T09:38:24.283590-04:00] [info] Isotovideo exit status: 1
}}{{collapse(good job logs)
[2024-09-17T04:27:18.239032-04:00] [debug] [pid:10746] stopping autotest process 10782
[2024-09-17T04:27:18.242803-04:00] [debug] [pid:10746] [autotest] process exited: 0
[2024-09-17T04:27:18.346308-04:00] [debug] [pid:10746] done with autotest process
[2024-09-17T04:27:18.347247-04:00] [debug] [pid:10746] stopping command server 10775 because test execution ended
[2024-09-17T04:27:18.347427-04:00] [debug] [pid:10746] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/1BzRWhMNYgs94svW/broadcast
[2024-09-17T04:27:18.499693-04:00] [debug] [pid:10746] commands process exited: 0
[2024-09-17T04:27:18.600627-04:00] [debug] [pid:10746] done with command server
[2024-09-17T04:27:18.600955-04:00] [debug] [pid:10746] isotovideo done
[2024-09-17T04:27:18.608242-04:00] [debug] [pid:10786] QEMU status is not 'shutdown', it is 'running'
[2024-09-17T04:27:18.609286-04:00] [debug] [pid:10746] backend shutdown state: 0
[2024-09-17T04:27:18.611449-04:00] [info] [pid:10786] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-17T04:28:01.158750-04:00] [debug] [pid:10786] Passing remaining frames to the video encoder[out#0/webm @ 0x55a05fdbf9c0] video:12KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 6.289207%frame= 66 fps=0.9 q=50.0 Lsize= 13KiB time=00:00:02.70 bitrate= 39.0kbits/s speed=0.0362x
[2024-09-17T04:28:02.324960-04:00] [debug] [pid:10786] Waiting for video encoder to finalize the video
[2024-09-17T04:28:02.325313-04:00] [debug] [pid:10786] The built-in video encoder (pid 10791) terminated
[2024-09-17T04:28:02.325423-04:00] [debug] [pid:10786] The external video encoder (pid 10790) terminated
[2024-09-17T04:28:02.332402-04:00] [debug] [pid:10786] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 10786 (/usr/bin/isotovideo: backend)
[2024-09-17T04:28:02.335732-04:00] [debug] [pid:10786] sending magic and exit
[2024-09-17T04:28:02.336304-04:00] [debug] [pid:10746] received magic close
[2024-09-17T04:28:02.343881-04:00] [debug] [pid:10746] backend process exited: 0
[2024-09-17T04:28:02.450525-04:00] [debug] [pid:10746] stopping backend process 10786
[2024-09-17T04:28:02.450650-04:00] [debug] [pid:10746] done with backend process10746: EXIT 0
[2024-09-17T04:28:02.702003-04:00] [info] Isotovideo exit status: 0
[2024-09-17T04:27:18.239032-04:00] [debug] [pid:10746] stopping autotest process 10782
10746: EXIT 0
[2024-09-17T04:28:02.702003-04:00] [info] Isotovideo exit status: 0
[2024-09-17T04:28:02.450650-04:00] [debug] [pid:10746] done with backend process
}}
autotest process seems to start termination earlier but I am not sure. I tried to dig into the code but there are different places which are executing tasks at that point and I cant understand whats the sequence
Updated by ybonatakis 2 months ago
- Status changed from Feedback to Workable
Seems I focus on different problem. I tried to track the QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578
when the real problem is that the simple_test
passes but its status on the inner openqa is incomplete. QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578
comes after the end of the job, so propably another issue!!?
Running >60 jobs with isotovideo
I encountered the original problem only once. test failed a few times though in different occurrence. So I am running out of ideas and I will be unassigned to get the chance to someone for a fresh look.
Updated by livdywan 2 months ago · Edited
I tried to reproduce like so, using https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-2G&test=openqa_from_bootstrap&version=Tumbleweed#downloads for the image:
wget https://openqa.opensuse.org/tests/4493151/asset/hdd/opensuse-Tumbleweed-x86_64-20240917-minimalx@64bit.qcow2
isotovideo -e CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-openQA/ NEEDLES_DIR=https://github.com/os-autoinst/os-autoinst-needles-openQA QEMU_NO_KVM=1 QEMU=x86_64 HDD_1=opensuse-Tumbleweed-x86_64-20240917-minimalx@64bit.qcow2
[...]
[2024-09-19T16:46:25.500999+02:00] [info] [pid:171843] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'openqa-desktop' matched
[...]
[2024-09-19T16:46:59.397550+02:00] [debug] [pid:171843] post_fail_hook failed: command 'ip a del 10.0.2.2/15 dev br1' timed out at ~/bau/oa/repos/distri-openQA/os-autoinst-distri-openQA/lib/openQAcoretest.pm line 10.
After removing the failing script_run from lib/openQAcoretest.pm
I get this:
[2024-09-19T17:16:22.339125+02:00] [debug] [pid:249716] post_fail_hook failed: command 'ps -ef | tee running_processes.txt' timed out at ~/bau/oa/repos/distri-openQA/os-autoinst-distri-openQA/lib/utils.pm line 11.
That's presumably the upload to a non-existing web UI timing out. Is there any way to get to the logs produced by post_fail_hook
?
Updated by openqa_review 8 days ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: openqa_install+publish
https://openqa.opensuse.org/tests/4634462#step/tests/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review 4 days ago
- Due date set to 2024-11-29
Setting due date based on mean cycle time of SUSE QE Tools
Updated by jbaier_cz 4 days ago
So to be sure, I used isotovideo
directly as mentioned earlier. The issue can be reproduced usually within twenty runs:
[2024-11-11T11:57:46.375712-05:00] [debug] [pid:124619] Waiting for video encoder to finalize the video
[2024-11-11T11:57:46.376082-05:00] [debug] [pid:124619] The external video encoder (pid 124621) terminated
[2024-11-11T11:57:46.376251-05:00] [debug] [pid:124619] The built-in video encoder (pid 124622) terminated
[2024-11-11T11:57:46.378280-05:00] [debug] [pid:124619] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124619 (/usr/bin/isotovideo: backend)
[2024-11-11T11:57:46.380877-05:00] [debug] [pid:124619] sending magic and exit
[2024-11-11T11:57:46.545817-05:00] [debug] [pid:124599] done with backend process
124599: EXIT 1
[2024-11-11T11:57:46.613745-05:00] [info] Isotovideo exit status: 1
[2024-11-11T11:57:46.722113-05:00] [info] +++ worker notes +++
[2024-11-11T11:57:46.722836-05:00] [info] End time: 2024-11-11 16:57:46
[2024-11-11T11:57:46.722981-05:00] [info] Result: died
[2024-11-11T11:57:46.750790-05:00] [info] Uploading video.webm
[2024-11-11T11:57:46.803408-05:00] [info] Uploading autoinst-log.txt
after that I did try to just execute the inner test with isotovideo to see if that is stable, executed 500 tests without any issue.
As a next step I would like to create a minimal working example for the bug to reproduce. From the latest failure, one can see that the test is successfully executed inside osautoinst/test_running but a test module later, it is already showing as incomplete a few modules later and even long before the shutdown, so it shouldn't be related to shutting down the openQA inside the test (as I initially thought).
Updated by jbaier_cz 4 days ago
Minor improvement https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/216 for the published HDD to be reusable right away by our current tests.
Updated by jbaier_cz about 14 hours ago
- Due date deleted (
2024-11-29) - Status changed from In Progress to Workable
I tried to reuse the published image and spawn 50 inner jobs inside one test. I am struggle to reproduce the issue that way, so I will need to use another approach after HW.