action #166445
openopenQA Project (public) - coordination #127031: [saga][epic] openQA for SUSE customers
openQA Project (public) - coordination #165393: [epic] Improved code coverage in openQA
[openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:S
Added by jbaier_cz 3 months ago. Updated about 3 hours ago.
0%
Description
Observation¶
The inner test is incomplete despite the only step simple_boot is passed.
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_bootstrap@64bit-2G fails in
tests
autoinst-log.txt for the inner tests shows:
[2024-09-11T07:00:30.593688-04:00] [debug] [pid:51558] [autotest] process exited: 0
…
[2024-09-11T07:00:30.705630-04:00] [debug] [pid:51578] backend got TERM
[2024-09-11T07:00:30.705933-04:00] [info] [pid:51578] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-11T07:00:31.758537-04:00] [debug] [pid:51578] Passing remaining frames to the video encoder
[2024-09-11T07:00:31.794569-04:00] [debug] [pid:51578] Waiting for video encoder to finalize the video
[2024-09-11T07:00:31.794634-04:00] [debug] [pid:51578] The built-in video encoder (pid 51580) terminated
[2024-09-11T07:00:31.794967-04:00] [debug] [pid:51578] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 (/usr/bin/isotovideo: backend)
[2024-09-11T07:00:31.795581-04:00] [debug] [pid:51578] sending magic and exit
[2024-09-11T07:00:31.908863-04:00] [debug] [pid:51558] done with backend process
51558: EXIT 1
[2024-09-11T07:00:31.927014-04:00] [info] Isotovideo exit status: 1
[2024-09-11T07:00:31.965003-04:00] [info] +++ worker notes +++
[2024-09-11T07:00:31.965099-04:00] [info] End time: 2024-09-11 11:00:31
[2024-09-11T07:00:31.965147-04:00] [info] Result: died
[2024-09-11T07:00:31.974998-04:00] [info] Uploading video.ogv
[2024-09-11T07:00:32.010558-04:00] [info] Uploading autoinst-log.txt
also from the worker
[2024-09-11T18:03:44.887585Z] [error] REST-API error (POST https://openqa.opensuse.org/api/v1/jobs/4472003/status): Connection error: Premature connection close (remaining tries: 59)
Reproducible¶
Fails since (at least) Build :TW.31137 (current job). Judging from https://openqa.opensuse.org/tests/4471721#next_previous it seems the job fails the same at least 1/100 times.
Expected result¶
Last good: :TW.31136 (or more recent)
Suggestions¶
Take a look into the inner job's autoinst-log.txt and also consider to upload the worker-log.txt(DONE)Consider compressing and uploading the whole results directory(DONE)Look into the error message visible in https://openqa.opensuse.org/tests/4471721/logfile?filename=tests-journal.log.txt#line-5968(DONE, this happens in successful tests as well) tests-journal.log.txttests-journal.log.txtSep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK* log.c: 92: core_log_lib_info: src version: 2.1.0 Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK* log.c: 94: core_log_lib_info: compiled with support for shutdown state Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK* log.c: 97: core_log_lib_info: compiled with libndctl 63+ Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK* log.c: 92: core_log_lib_info: src version: 2.1.0 Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK* log.c: 94: core_log_lib_info: compiled with support for shutdown state Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK* log.c: 97: core_log_lib_info: compiled with libndctl 63+ Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK* log.c: 92: core_log_lib_info: src version: 2.1.0 Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK* log.c: 94: core_log_lib_info: compiled with support for shutdown state Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK* log.c: 97: core_log_lib_info: compiled with libndctl 63+Sep 11 10:04:18 susetest worker[124339]: [info] 124339: WORKING 1
- Make the error more easily reproducible, e.g. by skipping test modules and ensuring that the problem still reproduces clearly
- Crosscheck with a different OS base, e.g. Leap vs. Tumbleweed, on multiple levels, could be OS within the inner test has an impact or qemu on the outer layer or something else
- Create another more specific ticket about the underlying issue independant of os-autoinst-distri-openQA
- In the new ticket suggest to check if there is something wrong with testfd: https://github.com/os-autoinst/os-autoinst/blob/master/script/isotovideo#L144
Further details¶
Always latest result in this scenario: latest
- Issue can be reproduced with just
isotovideo -e
in a loop. The issue usually happens at least once in a 20 invocations. inner job's autoinst-log.txt
autoinst-log.txtautoinst-log.txt[2024-11-11T11:57:25.407072-05:00] [debug] [pid:124618] >>> testapi::_handle_found_needle: found bootloader, similarity 1.00 @ 0/370 [2024-11-11T11:57:25.411049-05:00] [debug] [pid:124618] ||| finished boot tests (runtime: 11 s) [2024-11-11T11:57:25.417387-05:00] [debug] [pid:124599] [autotest] process exited: 0 [2024-11-11T11:57:25.417909-05:00] [debug] [pid:124599] stopping command server 124615 because test execution ended [2024-11-11T11:57:25.418077-05:00] [debug] [pid:124599] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/ZaiH7djgcxpbtus1/broadcast [2024-11-11T11:57:25.441790-05:00] [debug] [pid:124599] commands process exited: 0 [2024-11-11T11:57:25.542427-05:00] [debug] [pid:124599] done with command server [2024-11-11T11:57:25.542619-05:00] [debug] [pid:124599] stopping autotest process 124618 [2024-11-11T11:57:25.542794-05:00] [debug] [pid:124599] done with autotest process [2024-11-11T11:57:25.542924-05:00] [debug] [pid:124599] isotovideo failed [2024-11-11T11:57:25.544034-05:00] [debug] [pid:124599] stopping backend process 124619 [2024-11-11T11:57:25.544427-05:00] [debug] [pid:124619] backend got TERM [2024-11-11T11:57:25.544904-05:00] [info] [pid:124619] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json [2024-11-11T11:57:26.555718-05:00] [debug] [pid:124619] Passing remaining frames to the video encoder [out#0/webm @ 0x559f0307c440] video:22KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 2.759109% frame= 24 fps=0.8 q=50.0 Lsize= 23KiB time=00:00:00.95 bitrate= 194.0kbits/s speed=0.031x [2024-11-11T11:57:46.375712-05:00] [debug] [pid:124619] Waiting for video encoder to finalize the video [2024-11-11T11:57:46.376082-05:00] [debug] [pid:124619] The external video encoder (pid 124621) terminated [2024-11-11T11:57:46.376251-05:00] [debug] [pid:124619] The built-in video encoder (pid 124622) terminated [2024-11-11T11:57:46.378280-05:00] [debug] [pid:124619] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124619 (/usr/bin/isotovideo: backend) [2024-11-11T11:57:46.380877-05:00] [debug] [pid:124619] sending magic and exit [2024-11-11T11:57:46.545817-05:00] [debug] [pid:124599] done with backend process 124599: EXIT 1 [2024-11-11T11:57:46.613745-05:00] [info] Isotovideo exit status: 1 [2024-11-11T11:57:46.722113-05:00] [info] +++ worker notes +++ [2024-11-11T11:57:46.722836-05:00] [info] End time: 2024-11-11 16:57:46 [2024-11-11T11:57:46.722981-05:00] [info] Result: died [2024-11-11T11:57:46.750790-05:00] [info] Uploading video.webm [2024-11-11T11:57:46.803408-05:00] [info] Uploading autoinst-log.txt
inner job's worker-log.txt
[2024-11-11T11:57:08.888829-05:00] [info] Preparing cgroup to start isotovideo [2024-11-11T11:57:08.890332-05:00] [warn] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd: Permission den ied at /usr/lib/perl5/vendor_perl/5.40.0/Mojo/File.pm line 85. [2024-11-11T11:57:08.890537-05:00] [info] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROU P_FS. [2024-11-11T11:57:08.891118-05:00] [info] Starting isotovideo container [2024-11-11T11:57:08.893714-05:00] [info] isotovideo has been started (PID: 124599) [2024-11-11T11:57:08.894866-05:00] [info] 124599: WORKING 1 [2024-11-11T11:57:46.750369-05:00] [info] Uploading video.webm [2024-11-11T11:57:46.803320-05:00] [info] Uploading autoinst-log.txt [2024-11-11T11:57:46.829894-05:00] [info] Uploading worker-log.txt
Updated by jbaier_cz 3 months ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-(test-details|search-results).*matched" to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-test-details.*matched"
Updated by jbaier_cz 3 months ago
- Related to action #165692: [openQA-in-openQA] test does not fail if state is done but incomplete auto_review:"no candidate needle.*openqa-dashboard.*matched" size:S added
Updated by jbaier_cz 3 months ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-test-details.*matched" to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry
Updated by tinita 3 months ago
Just a small PR so we have the actual logfile for investigation: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/206 Correct uploading of autoinst-log.txt
Updated by jbaier_cz 3 months ago
Example of failure with the autoinst-log.txt attached: https://openqa.opensuse.org/tests/4471721
Updated by okurz 3 months ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by ybonatakis 3 months ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by openqa_review 3 months ago
- Due date set to 2024-09-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 3 months ago
- Due date deleted (
2024-09-27)
Updated by openqa_review 3 months ago
- Due date set to 2024-10-01
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 3 months ago
- Status changed from In Progress to Feedback
baseclass calls stop_vm the qemu error comes between $self->_stop_video_encoder();
and $self->close_pipes();
messages.
However before we terminate test runner and isotovideo. The error code is assigned when $runner->testfd
. the code is commented with unusual shutdown
Updated by ybonatakis 3 months ago
Updated by ybonatakis 3 months ago · Edited
I set up a virtual machine and I compared the logs:
failed job logsfailed job logs
[1;34m[2024-09-16T09:38:21.788738-04:00] [debug] [pid:124359] ||| finished boot tests (runtime: 11 s)[0m
[37m[2024-09-16T09:38:21.793705-04:00] [debug] [pid:124339] [autotest] process exited: 0[0m
[37m[2024-09-16T09:38:21.794268-04:00] [debug] [pid:124339] stopping command server 124356 because test execution ended[0m
[37m[2024-09-16T09:38:21.794336-04:00] [debug] [pid:124339] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/IUjZf4snrxneedUn/broadcast[0m
[37m[2024-09-16T09:38:21.821683-04:00] [debug] [pid:124339] commands process exited: 0[0m
[37m[2024-09-16T09:38:21.921957-04:00] [debug] [pid:124339] done with command server[0m
[37m[2024-09-16T09:38:21.922056-04:00] [debug] [pid:124339] stopping autotest process 124359[0m
[37m[2024-09-16T09:38:21.922138-04:00] [debug] [pid:124339] done with autotest process[0m
[37m[2024-09-16T09:38:21.922191-04:00] [debug] [pid:124339] isotovideo failed[0m
[37m[2024-09-16T09:38:21.923372-04:00] [debug] [pid:124339] stopping backend process 124360[0m
[37m[2024-09-16T09:38:21.923566-04:00] [debug] [pid:124360] backend got TERM[0m
[33m[2024-09-16T09:38:21.924008-04:00] [info] [pid:124360] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json[0m
[37m[2024-09-16T09:38:22.978555-04:00] [debug] [pid:124360] Passing remaining frames to the video encoder[0m
[out#0/webm @ 0x555a322309c0] video:20KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 3.039424%
frame= 24 fps=2.0 q=50.0 Lsize= 20KiB time=00:00:00.95 bitrate= 174.3kbits/s speed=0.0779x
[37m[2024-09-16T09:38:24.081336-04:00] [debug] [pid:124360] Waiting for video encoder to finalize the video[0m
[37m[2024-09-16T09:38:24.081465-04:00] [debug] [pid:124360] The external video encoder (pid 124362) terminated[0m
[37m[2024-09-16T09:38:24.081668-04:00] [debug] [pid:124360] The built-in video encoder (pid 124363) terminated[0m
[37m[2024-09-16T09:38:24.082421-04:00] [debug] [pid:124360] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124360 (/usr/bin/isotovideo: backend)[0m
[37m[2024-09-16T09:38:24.083632-04:00] [debug] [pid:124360] sending magic and exit[0m
[37m[2024-09-16T09:38:24.228532-04:00] [debug] [pid:124339] done with backend process[0m
124339: EXIT 1
[2024-09-16T09:38:24.283590-04:00] [info] Isotovideo exit status: 1
[2024-09-16T09:38:21.788738-04:00] [debug] [pid:124359] ||| finished boot tests (runtime: 11 s)
[2024-09-17T06:31:39.770626-04:00] [debug] [pid:12687] stopping autotest process 12715
[2024-09-17T06:31:39.774491-04:00] [debug] [pid:12687] [autotest] process exited: 0
[2024-09-17T06:31:39.877689-04:00] [debug] [pid:12687] done with autotest process
[2024-09-17T06:31:39.879583-04:00] [debug] [pid:12687] stopping command server 12708 because test execution ended
[2024-09-17T06:31:39.880286-04:00] [debug] [pid:12687] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/ka6wfrpoCY5vl8h3/broadcast
[2024-09-17T06:31:40.008277-04:00] [debug] [pid:12687] commands process exited: 0
[2024-09-17T06:31:40.109281-04:00] [debug] [pid:12687] done with command server
[2024-09-17T06:31:40.109674-04:00] [debug] [pid:12687] isotovideo done
[2024-09-17T06:31:40.114623-04:00] [debug] [pid:12718] QEMU status is not 'shutdown', it is 'running'
[2024-09-17T06:31:40.116448-04:00] [debug] [pid:12687] backend shutdown state: 0
[2024-09-17T06:31:40.118529-04:00] [info] [pid:12718] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-17T06:31:41.219579-04:00] [debug] [pid:12718] Passing remaining frames to the video encoder[out#0/webm @ 0x55a34f82a9c0] video:19KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 3.109252%frame= 24 fps=1.9 q=50.0 Lsize= 20KiB time=00:00:00.95 bitrate= 170.8kbits/s speed=0.0771x
[2024-09-17T06:31:42.247364-04:00] [debug] [pid:12718] Waiting for video encoder to finalize the video
[2024-09-17T06:31:42.247688-04:00] [debug] [pid:12718] The external video encoder (pid 12722) terminated
[2024-09-17T06:31:42.247790-04:00] [debug] [pid:12718] The built-in video encoder (pid 12723) terminated
[2024-09-17T06:31:42.248534-04:00] [debug] [pid:12718] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 12718 (/usr/bin/isotovideo: backend)
[2024-09-17T06:31:42.249839-04:00] [debug] [pid:12718] sending magic and exit
[2024-09-17T06:31:42.250322-04:00] [debug] [pid:12687] received magic close
[2024-09-17T06:31:42.256949-04:00] [debug] [pid:12687] backend p
rocess exited: 0
[2024-09-17T06:31:42.362437-04:00] [debug] [pid:12687] stopping backend process 12718
[2024-09-17T06:31:42.362756-04:00] [debug] [pid:12687] done with backend process12687: EXIT 0
[2024-09-17T06:31:42.563454-04:00] [info] Isotovideo exit status: 0
[2024-09-17T06:31:39.770626-04:00] [debug] [pid:12687] stopping autotest process 12715
Updated by ybonatakis 3 months ago
ybonatakis wrote in #note-15:
I set up a virtual machine and I compared the logs:
{{collapse(failed job logs)
[1;34m[2024-09-16T09:38:21.788738-04:00] [debug] [pid:124359] ||| finished boot tests (runtime: 11 s)[0m
[37m[2024-09-16T09:38:21.793705-04:00] [debug] [pid:124339] [autotest] process exited: 0[0m
[37m[2024-09-16T09:38:21.794268-04:00] [debug] [pid:124339] stopping command server 124356 because test execution ended[0m
[37m[2024-09-16T09:38:21.794336-04:00] [debug] [pid:124339] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/IUjZf4snrxneedUn/broadcast[0m
[37m[2024-09-16T09:38:21.821683-04:00] [debug] [pid:124339] commands process exited: 0[0m
[37m[2024-09-16T09:38:21.921957-04:00] [debug] [pid:124339] done with command server[0m
[37m[2024-09-16T09:38:21.922056-04:00] [debug] [pid:124339] stopping autotest process 124359[0m
[37m[2024-09-16T09:38:21.922138-04:00] [debug] [pid:124339] done with autotest process[0m
[37m[2024-09-16T09:38:21.922191-04:00] [debug] [pid:124339] isotovideo failed[0m
[37m[2024-09-16T09:38:21.923372-04:00] [debug] [pid:124339] stopping backend process 124360[0m
[37m[2024-09-16T09:38:21.923566-04:00] [debug] [pid:124360] backend got TERM[0m
[33m[2024-09-16T09:38:21.924008-04:00] [info] [pid:124360] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json[0m
[37m[2024-09-16T09:38:22.978555-04:00] [debug] [pid:124360] Passing remaining frames to the video encoder[0m
[out#0/webm @ 0x555a322309c0] video:20KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 3.039424%
frame= 24 fps=2.0 q=50.0 Lsize= 20KiB time=00:00:00.95 bitrate= 174.3kbits/s speed=0.0779x
[37m[2024-09-16T09:38:24.081336-04:00] [debug] [pid:124360] Waiting for video encoder to finalize the video[0m
[37m[2024-09-16T09:38:24.081465-04:00] [debug] [pid:124360] The external video encoder (pid 124362) terminated[0m
[37m[2024-09-16T09:38:24.081668-04:00] [debug] [pid:124360] The built-in video encoder (pid 124363) terminated[0m
[37m[2024-09-16T09:38:24.082421-04:00] [debug] [pid:124360] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124360 (/usr/bin/isotovideo: backend)[0m
[37m[2024-09-16T09:38:24.083632-04:00] [debug] [pid:124360] sending magic and exit[0m
[37m[2024-09-16T09:38:24.228532-04:00] [debug] [pid:124339] done with backend process[0m
124339: EXIT 1
[2024-09-16T09:38:24.283590-04:00] [info] Isotovideo exit status: 1
}}{{collapse(good job logs)
[2024-09-17T04:27:18.239032-04:00] [debug] [pid:10746] stopping autotest process 10782
[2024-09-17T04:27:18.242803-04:00] [debug] [pid:10746] [autotest] process exited: 0
[2024-09-17T04:27:18.346308-04:00] [debug] [pid:10746] done with autotest process
[2024-09-17T04:27:18.347247-04:00] [debug] [pid:10746] stopping command server 10775 because test execution ended
[2024-09-17T04:27:18.347427-04:00] [debug] [pid:10746] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/1BzRWhMNYgs94svW/broadcast
[2024-09-17T04:27:18.499693-04:00] [debug] [pid:10746] commands process exited: 0
[2024-09-17T04:27:18.600627-04:00] [debug] [pid:10746] done with command server
[2024-09-17T04:27:18.600955-04:00] [debug] [pid:10746] isotovideo done
[2024-09-17T04:27:18.608242-04:00] [debug] [pid:10786] QEMU status is not 'shutdown', it is 'running'
[2024-09-17T04:27:18.609286-04:00] [debug] [pid:10746] backend shutdown state: 0
[2024-09-17T04:27:18.611449-04:00] [info] [pid:10786] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-17T04:28:01.158750-04:00] [debug] [pid:10786] Passing remaining frames to the video encoder[out#0/webm @ 0x55a05fdbf9c0] video:12KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 6.289207%frame= 66 fps=0.9 q=50.0 Lsize= 13KiB time=00:00:02.70 bitrate= 39.0kbits/s speed=0.0362x
[2024-09-17T04:28:02.324960-04:00] [debug] [pid:10786] Waiting for video encoder to finalize the video
[2024-09-17T04:28:02.325313-04:00] [debug] [pid:10786] The built-in video encoder (pid 10791) terminated
[2024-09-17T04:28:02.325423-04:00] [debug] [pid:10786] The external video encoder (pid 10790) terminated
[2024-09-17T04:28:02.332402-04:00] [debug] [pid:10786] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 10786 (/usr/bin/isotovideo: backend)
[2024-09-17T04:28:02.335732-04:00] [debug] [pid:10786] sending magic and exit
[2024-09-17T04:28:02.336304-04:00] [debug] [pid:10746] received magic close
[2024-09-17T04:28:02.343881-04:00] [debug] [pid:10746] backend process exited: 0
[2024-09-17T04:28:02.450525-04:00] [debug] [pid:10746] stopping backend process 10786
[2024-09-17T04:28:02.450650-04:00] [debug] [pid:10746] done with backend process10746: EXIT 0
[2024-09-17T04:28:02.702003-04:00] [info] Isotovideo exit status: 0
[2024-09-17T04:27:18.239032-04:00] [debug] [pid:10746] stopping autotest process 10782
10746: EXIT 0
[2024-09-17T04:28:02.702003-04:00] [info] Isotovideo exit status: 0
[2024-09-17T04:28:02.450650-04:00] [debug] [pid:10746] done with backend process
}}
autotest process seems to start termination earlier but I am not sure. I tried to dig into the code but there are different places which are executing tasks at that point and I cant understand whats the sequence
Updated by ybonatakis 3 months ago
- Status changed from Feedback to Workable
Seems I focus on different problem. I tried to track the QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578
when the real problem is that the simple_test
passes but its status on the inner openqa is incomplete. QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578
comes after the end of the job, so propably another issue!!?
Running >60 jobs with isotovideo
I encountered the original problem only once. test failed a few times though in different occurrence. So I am running out of ideas and I will be unassigned to get the chance to someone for a fresh look.
Updated by livdywan 3 months ago · Edited
I tried to reproduce like so, using https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-2G&test=openqa_from_bootstrap&version=Tumbleweed#downloads for the image:
wget https://openqa.opensuse.org/tests/4493151/asset/hdd/opensuse-Tumbleweed-x86_64-20240917-minimalx@64bit.qcow2
isotovideo -e CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-openQA/ NEEDLES_DIR=https://github.com/os-autoinst/os-autoinst-needles-openQA QEMU_NO_KVM=1 QEMU=x86_64 HDD_1=opensuse-Tumbleweed-x86_64-20240917-minimalx@64bit.qcow2
[...]
[2024-09-19T16:46:25.500999+02:00] [info] [pid:171843] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'openqa-desktop' matched
[...]
[2024-09-19T16:46:59.397550+02:00] [debug] [pid:171843] post_fail_hook failed: command 'ip a del 10.0.2.2/15 dev br1' timed out at ~/bau/oa/repos/distri-openQA/os-autoinst-distri-openQA/lib/openQAcoretest.pm line 10.
After removing the failing script_run from lib/openQAcoretest.pm
I get this:
[2024-09-19T17:16:22.339125+02:00] [debug] [pid:249716] post_fail_hook failed: command 'ps -ef | tee running_processes.txt' timed out at ~/bau/oa/repos/distri-openQA/os-autoinst-distri-openQA/lib/utils.pm line 11.
That's presumably the upload to a non-existing web UI timing out. Is there any way to get to the logs produced by post_fail_hook
?
Updated by openqa_review about 1 month ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: openqa_install+publish
https://openqa.opensuse.org/tests/4634462#step/tests/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review 27 days ago
- Due date set to 2024-11-29
Setting due date based on mean cycle time of SUSE QE Tools
Updated by jbaier_cz 26 days ago
So to be sure, I used isotovideo
directly as mentioned earlier. The issue can be reproduced usually within twenty runs:
[2024-11-11T11:57:46.375712-05:00] [debug] [pid:124619] Waiting for video encoder to finalize the video
[2024-11-11T11:57:46.376082-05:00] [debug] [pid:124619] The external video encoder (pid 124621) terminated
[2024-11-11T11:57:46.376251-05:00] [debug] [pid:124619] The built-in video encoder (pid 124622) terminated
[2024-11-11T11:57:46.378280-05:00] [debug] [pid:124619] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124619 (/usr/bin/isotovideo: backend)
[2024-11-11T11:57:46.380877-05:00] [debug] [pid:124619] sending magic and exit
[2024-11-11T11:57:46.545817-05:00] [debug] [pid:124599] done with backend process
124599: EXIT 1
[2024-11-11T11:57:46.613745-05:00] [info] Isotovideo exit status: 1
[2024-11-11T11:57:46.722113-05:00] [info] +++ worker notes +++
[2024-11-11T11:57:46.722836-05:00] [info] End time: 2024-11-11 16:57:46
[2024-11-11T11:57:46.722981-05:00] [info] Result: died
[2024-11-11T11:57:46.750790-05:00] [info] Uploading video.webm
[2024-11-11T11:57:46.803408-05:00] [info] Uploading autoinst-log.txt
after that I did try to just execute the inner test with isotovideo to see if that is stable, executed 500 tests without any issue.
As a next step I would like to create a minimal working example for the bug to reproduce. From the latest failure, one can see that the test is successfully executed inside osautoinst/test_running but a test module later, it is already showing as incomplete a few modules later and even long before the shutdown, so it shouldn't be related to shutting down the openQA inside the test (as I initially thought).
Updated by jbaier_cz 26 days ago
Minor improvement https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/216 for the published HDD to be reusable right away by our current tests.
Updated by jbaier_cz 12 days ago
So far I am able to reproduce the issue within 20 tests; when using the image generated by a successful run and spawning the same test (without the openQA installation part of course), the issue is not reproducible anymore (or it needs a way more runs, which I am trying at this moment). Anyway, my observation leads to a hypothesis that the issue is connected either to a very first test or that the machine needs a reboot before first use. I will see, if the issue goes away if I introduce reboot just after the installation.
Updated by jbaier_cz 6 days ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:M to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry
- Description updated (diff)
- Status changed from Workable to New
- Assignee deleted (
jbaier_cz)
I collected the results and to my surprise even the reboot did not help to mitigate the issue. So either I did something wrong in my experiments or my hypothesis is completely wrong. I updated the description with all available data, from the log investigation the only real log difference is isotovideo failed
log line together with error code 1 from isotovideo. No other difference spotted in any log. Not sure, if we can achieve more verbosity? Anyway, as I am out of suggestions, I guess the way forward right now is to put this one back to re-estimate.
Updated by mkittler 6 days ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by tinita 5 days ago
What I don't understand is why stop_commands
is apparently called before stop_autotest
:
[2024-11-11T11:57:25.417909-05:00] [debug] [pid:124599] stopping command server 124615 because test execution ended <----- shouldn't be called that early
[2024-11-11T11:57:25.418077-05:00] [debug] [pid:124599] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/ZaiH7djgcxpbtus1/broadcast
[2024-11-11T11:57:25.441790-05:00] [debug] [pid:124599] commands process exited: 0
[2024-11-11T11:57:25.542427-05:00] [debug] [pid:124599] done with command server
[2024-11-11T11:57:25.542619-05:00] [debug] [pid:124599] stopping autotest process 124618
[2024-11-11T11:57:25.542794-05:00] [debug] [pid:124599] done with autotest process
I ran the inner test locally, shortened the test module to return immediately, added some sleep in the tests_done
handling, but the stop_commands
was always executed after that.
Updated by tinita 5 days ago · Edited
I started a bunch of tests with DEBUG_JSON_RPC=1
in the inner test and got one failure:
https://openqa.opensuse.org/tests/4688380
We can see that tests_done
is sent via myjsorpc:
https://openqa.opensuse.org/tests/4688380/logfile?filename=tests-autoinst-log.txt#line-211
But it is also read 4 lines below. Actually I cannot see that the token XcpFXfMJ
is ever read.
But even before that apparently stop_command_processing
is called. That would happen when the response is not defined.
[2024-12-06T13:57:41.592234-05:00] [debug] [pid:8923] send_json(5) JSON={"cmd":"tests_done","completed":1,"died":0,"json_cmd_token":"XcpFXfMJ"}
[2024-12-06T13:57:41.595469-05:00] [debug] [pid:8904] [autotest] process exited: 0
[2024-12-06T13:57:41.595601-05:00] [debug] [pid:8904] send_json(3) JSON={"json_cmd_token":"WqAryXaW","stop_processing_isotovideo_com":1}
[2024-12-06T13:57:41.595677-05:00] [debug] [pid:8920] read_json(4)
[2024-12-06T13:57:41.595757-05:00] [debug] [pid:8920] read_json(4) json_cmd_token=WqAryXaW
[2024-12-06T13:57:41.595883-05:00] [debug] [pid:8904] stopping command server 8920 because test execution ended
Updated by tinita about 6 hours ago
- Status changed from Workable to In Progress
One little improvement to the test could be to write the api jobs output to a file and upload it, because looking at the screenshot and figure out the status and result of the test isn't nice.
Working on that now.
Updated by tinita about 3 hours ago
https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/217 Upload list of jobs for easier debugging