Project

General

Profile

Actions

action #166445

open

openQA Project - coordination #127031: [saga][epic] openQA for SUSE customers

openQA Project - coordination #165393: [epic] Improved code coverage in openQA

[openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:M

Added by jbaier_cz 2 months ago. Updated about 14 hours ago.

Status:
Workable
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2024-09-06
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

The inner test is incomplete despite the only step simple_boot is passed.

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_bootstrap@64bit-2G fails in
tests

autoinst-log.txt for the inner tests shows:

[2024-09-11T07:00:30.593688-04:00] [debug] [pid:51558] [autotest] process exited: 0
…
[2024-09-11T07:00:30.705630-04:00] [debug] [pid:51578] backend got TERM
[2024-09-11T07:00:30.705933-04:00] [info] [pid:51578] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-11T07:00:31.758537-04:00] [debug] [pid:51578] Passing remaining frames to the video encoder
[2024-09-11T07:00:31.794569-04:00] [debug] [pid:51578] Waiting for video encoder to finalize the video
[2024-09-11T07:00:31.794634-04:00] [debug] [pid:51578] The built-in video encoder (pid 51580) terminated
[2024-09-11T07:00:31.794967-04:00] [debug] [pid:51578] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 (/usr/bin/isotovideo: backend)
[2024-09-11T07:00:31.795581-04:00] [debug] [pid:51578] sending magic and exit
[2024-09-11T07:00:31.908863-04:00] [debug] [pid:51558] done with backend process
51558: EXIT 1
[2024-09-11T07:00:31.927014-04:00] [info] Isotovideo exit status: 1
[2024-09-11T07:00:31.965003-04:00] [info] +++ worker notes +++
[2024-09-11T07:00:31.965099-04:00] [info] End time: 2024-09-11 11:00:31
[2024-09-11T07:00:31.965147-04:00] [info] Result: died
[2024-09-11T07:00:31.974998-04:00] [info] Uploading video.ogv
[2024-09-11T07:00:32.010558-04:00] [info] Uploading autoinst-log.txt

also from the worker

[2024-09-11T18:03:44.887585Z] [error] REST-API error (POST https://openqa.opensuse.org/api/v1/jobs/4472003/status): Connection error: Premature connection close (remaining tries: 59)

Reproducible

Fails since (at least) Build :TW.31137 (current job). Judging from https://openqa.opensuse.org/tests/4471721#next_previous it seems the job fails the same at least 1/100 times.

Expected result

Last good: :TW.31136 (or more recent)

Suggestions

Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK*  log.c:  92: core_log_lib_info: src version: 2.1.0
Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK*  log.c:  94: core_log_lib_info: compiled with support for shutdown state
Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK*  log.c:  97: core_log_lib_info: compiled with libndctl 63+
Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK*  log.c:  92: core_log_lib_info: src version: 2.1.0
Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK*  log.c:  94: core_log_lib_info: compiled with support for shutdown state
Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK*  log.c:  97: core_log_lib_info: compiled with libndctl 63+
Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK*  log.c:  92: core_log_lib_info: src version: 2.1.0
Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK*  log.c:  94: core_log_lib_info: compiled with support for shutdown state
Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK*  log.c:  97: core_log_lib_info: compiled with libndctl 63+Sep 11 10:04:18 susetest worker[124339]: [info] 124339: WORKING 1

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to openQA Project - action #165692: [openQA-in-openQA] test does not fail if state is done but incomplete auto_review:"no candidate needle.*openqa-dashboard.*matched" size:S Resolvedjbaier_cz2024-08-22

Actions
Actions #1

Updated by jbaier_cz 2 months ago

  • Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-(test-details|search-results).*matched"
Actions #2

Updated by jbaier_cz 2 months ago

  • Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-(test-details|search-results).*matched" to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-test-details.*matched"
Actions #3

Updated by jbaier_cz 2 months ago

  • Related to action #165692: [openQA-in-openQA] test does not fail if state is done but incomplete auto_review:"no candidate needle.*openqa-dashboard.*matched" size:S added
Actions #4

Updated by jbaier_cz 2 months ago

  • Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle with tag(s) .*openqa-test-details.*matched" to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry
Actions #5

Updated by tinita 2 months ago

Just a small PR so we have the actual logfile for investigation: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/206 Correct uploading of autoinst-log.txt

Actions #6

Updated by jbaier_cz 2 months ago

Example of failure with the autoinst-log.txt attached: https://openqa.opensuse.org/tests/4471721

Actions #7

Updated by jbaier_cz 2 months ago

  • Description updated (diff)
Actions #8

Updated by okurz 2 months ago

  • Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry to [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #9

Updated by ybonatakis 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to ybonatakis
Actions #10

Updated by openqa_review 2 months ago

  • Due date set to 2024-09-27

Setting due date based on mean cycle time of SUSE QE Tools

Actions #12

Updated by openqa_review 2 months ago

  • Due date set to 2024-10-01

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by ybonatakis 2 months ago

  • Status changed from In Progress to Feedback

baseclass calls stop_vm the qemu error comes between $self->_stop_video_encoder(); and $self->close_pipes(); messages.
However before we terminate test runner and isotovideo. The error code is assigned when $runner->testfd. the code is commented with unusual shutdown

Actions #15

Updated by ybonatakis 2 months ago · Edited

I set up a virtual machine and I compared the logs:

failed job logs

good job logs

Actions #16

Updated by ybonatakis 2 months ago

ybonatakis wrote in #note-15:

I set up a virtual machine and I compared the logs:

{{collapse(failed job logs)
[2024-09-16T09:38:21.788738-04:00] [debug] [pid:124359] ||| finished boot tests (runtime: 11 s)
[2024-09-16T09:38:21.793705-04:00] [debug] [pid:124339] [autotest] process exited: 0
[2024-09-16T09:38:21.794268-04:00] [debug] [pid:124339] stopping command server 124356 because test execution ended
[2024-09-16T09:38:21.794336-04:00] [debug] [pid:124339] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/IUjZf4snrxneedUn/broadcast
[2024-09-16T09:38:21.821683-04:00] [debug] [pid:124339] commands process exited: 0
[2024-09-16T09:38:21.921957-04:00] [debug] [pid:124339] done with command server
[2024-09-16T09:38:21.922056-04:00] [debug] [pid:124339] stopping autotest process 124359
[2024-09-16T09:38:21.922138-04:00] [debug] [pid:124339] done with autotest process
[2024-09-16T09:38:21.922191-04:00] [debug] [pid:124339] isotovideo failed
[2024-09-16T09:38:21.923372-04:00] [debug] [pid:124339] stopping backend process 124360
[2024-09-16T09:38:21.923566-04:00] [debug] [pid:124360] backend got TERM
[2024-09-16T09:38:21.924008-04:00] [info] [pid:124360] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-16T09:38:22.978555-04:00] [debug] [pid:124360] Passing remaining frames to the video encoder
[out#0/webm @ 0x555a322309c0] video:20KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 3.039424%
frame= 24 fps=2.0 q=50.0 Lsize= 20KiB time=00:00:00.95 bitrate= 174.3kbits/s speed=0.0779x

[2024-09-16T09:38:24.081336-04:00] [debug] [pid:124360] Waiting for video encoder to finalize the video
[2024-09-16T09:38:24.081465-04:00] [debug] [pid:124360] The external video encoder (pid 124362) terminated
[2024-09-16T09:38:24.081668-04:00] [debug] [pid:124360] The built-in video encoder (pid 124363) terminated
[2024-09-16T09:38:24.082421-04:00] [debug] [pid:124360] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124360 (/usr/bin/isotovideo: backend)
[2024-09-16T09:38:24.083632-04:00] [debug] [pid:124360] sending magic and exit
[2024-09-16T09:38:24.228532-04:00] [debug] [pid:124339] done with backend process
124339: EXIT 1
[2024-09-16T09:38:24.283590-04:00] [info] Isotovideo exit status: 1
}}

{{collapse(good job logs)
[2024-09-17T04:27:18.239032-04:00] [debug] [pid:10746] stopping autotest process 10782
[2024-09-17T04:27:18.242803-04:00] [debug] [pid:10746] [autotest] process exited: 0
[2024-09-17T04:27:18.346308-04:00] [debug] [pid:10746] done with autotest process
[2024-09-17T04:27:18.347247-04:00] [debug] [pid:10746] stopping command server 10775 because test execution ended
[2024-09-17T04:27:18.347427-04:00] [debug] [pid:10746] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/1BzRWhMNYgs94svW/broadcast
[2024-09-17T04:27:18.499693-04:00] [debug] [pid:10746] commands process exited: 0
[2024-09-17T04:27:18.600627-04:00] [debug] [pid:10746] done with command server
[2024-09-17T04:27:18.600955-04:00] [debug] [pid:10746] isotovideo done
[2024-09-17T04:27:18.608242-04:00] [debug] [pid:10786] QEMU status is not 'shutdown', it is 'running'
[2024-09-17T04:27:18.609286-04:00] [debug] [pid:10746] backend shutdown state: 0
[2024-09-17T04:27:18.611449-04:00] [info] [pid:10786] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-17T04:28:01.158750-04:00] [debug] [pid:10786] Passing remaining frames to the video encoder[out#0/webm @ 0x55a05fdbf9c0] video:12KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 6.289207%frame= 66 fps=0.9 q=50.0 Lsize= 13KiB time=00:00:02.70 bitrate= 39.0kbits/s speed=0.0362x

[2024-09-17T04:28:02.324960-04:00] [debug] [pid:10786] Waiting for video encoder to finalize the video
[2024-09-17T04:28:02.325313-04:00] [debug] [pid:10786] The built-in video encoder (pid 10791) terminated
[2024-09-17T04:28:02.325423-04:00] [debug] [pid:10786] The external video encoder (pid 10790) terminated
[2024-09-17T04:28:02.332402-04:00] [debug] [pid:10786] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 10786 (/usr/bin/isotovideo: backend)
[2024-09-17T04:28:02.335732-04:00] [debug] [pid:10786] sending magic and exit
[2024-09-17T04:28:02.336304-04:00] [debug] [pid:10746] received magic close
[2024-09-17T04:28:02.343881-04:00] [debug] [pid:10746] backend process exited: 0
[2024-09-17T04:28:02.450525-04:00] [debug] [pid:10746] stopping backend process 10786
[2024-09-17T04:28:02.450650-04:00] [debug] [pid:10746] done with backend process10746: EXIT 0
[2024-09-17T04:28:02.702003-04:00] [info] Isotovideo exit status: 0
[2024-09-17T04:27:18.239032-04:00] [debug] [pid:10746] stopping autotest process 10782
10746: EXIT 0
[2024-09-17T04:28:02.702003-04:00] [info] Isotovideo exit status: 0
[2024-09-17T04:28:02.450650-04:00] [debug] [pid:10746] done with backend process
}}

autotest process seems to start termination earlier but I am not sure. I tried to dig into the code but there are different places which are executing tasks at that point and I cant understand whats the sequence

Actions #17

Updated by okurz 2 months ago

ybonatakis runs tests locally to reproduce

Actions #18

Updated by ybonatakis 2 months ago

  • Status changed from Feedback to Workable

Seems I focus on different problem. I tried to track the QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 when the real problem is that the simple_test passes but its status on the inner openqa is incomplete. QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 comes after the end of the job, so propably another issue!!?

Running >60 jobs with isotovideo I encountered the original problem only once. test failed a few times though in different occurrence. So I am running out of ideas and I will be unassigned to get the chance to someone for a fresh look.

Actions #19

Updated by livdywan 2 months ago · Edited

I tried to reproduce like so, using https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-2G&test=openqa_from_bootstrap&version=Tumbleweed#downloads for the image:

wget https://openqa.opensuse.org/tests/4493151/asset/hdd/opensuse-Tumbleweed-x86_64-20240917-minimalx@64bit.qcow2
isotovideo -e CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-openQA/ NEEDLES_DIR=https://github.com/os-autoinst/os-autoinst-needles-openQA QEMU_NO_KVM=1 QEMU=x86_64 HDD_1=opensuse-Tumbleweed-x86_64-20240917-minimalx@64bit.qcow2
[...]
[2024-09-19T16:46:25.500999+02:00] [info] [pid:171843] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'openqa-desktop' matched
[...]
[2024-09-19T16:46:59.397550+02:00] [debug] [pid:171843] post_fail_hook failed: command 'ip a del 10.0.2.2/15 dev br1' timed out at ~/bau/oa/repos/distri-openQA/os-autoinst-distri-openQA/lib/openQAcoretest.pm line 10.

After removing the failing script_run from lib/openQAcoretest.pm I get this:

[2024-09-19T17:16:22.339125+02:00] [debug] [pid:249716] post_fail_hook failed: command 'ps -ef | tee running_processes.txt' timed out at ~/bau/oa/repos/distri-openQA/os-autoinst-distri-openQA/lib/utils.pm line 11.

That's presumably the upload to a non-existing web UI timing out. Is there any way to get to the logs produced by post_fail_hook?

Actions #20

Updated by ybonatakis about 2 months ago

  • Assignee deleted (ybonatakis)
Actions #21

Updated by livdywan about 2 months ago

  • Due date deleted (2024-10-01)

No assignee no due date.

Actions #22

Updated by okurz about 1 month ago

  • Parent task set to #165393
Actions #23

Updated by openqa_review 8 days ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: openqa_install+publish
https://openqa.opensuse.org/tests/4634462#step/tests/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #24

Updated by jbaier_cz 5 days ago

  • Status changed from Workable to In Progress
  • Assignee set to jbaier_cz
Actions #25

Updated by openqa_review 4 days ago

  • Due date set to 2024-11-29

Setting due date based on mean cycle time of SUSE QE Tools

Actions #26

Updated by jbaier_cz 4 days ago

So to be sure, I used isotovideo directly as mentioned earlier. The issue can be reproduced usually within twenty runs:

[2024-11-11T11:57:46.375712-05:00] [debug] [pid:124619] Waiting for video encoder to finalize the video
[2024-11-11T11:57:46.376082-05:00] [debug] [pid:124619] The external video encoder (pid 124621) terminated
[2024-11-11T11:57:46.376251-05:00] [debug] [pid:124619] The built-in video encoder (pid 124622) terminated
[2024-11-11T11:57:46.378280-05:00] [debug] [pid:124619] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 124619 (/usr/bin/isotovideo: backend)
[2024-11-11T11:57:46.380877-05:00] [debug] [pid:124619] sending magic and exit
[2024-11-11T11:57:46.545817-05:00] [debug] [pid:124599] done with backend process
124599: EXIT 1
[2024-11-11T11:57:46.613745-05:00] [info] Isotovideo exit status: 1
[2024-11-11T11:57:46.722113-05:00] [info] +++ worker notes +++
[2024-11-11T11:57:46.722836-05:00] [info] End time: 2024-11-11 16:57:46
[2024-11-11T11:57:46.722981-05:00] [info] Result: died
[2024-11-11T11:57:46.750790-05:00] [info] Uploading video.webm
[2024-11-11T11:57:46.803408-05:00] [info] Uploading autoinst-log.txt

after that I did try to just execute the inner test with isotovideo to see if that is stable, executed 500 tests without any issue.

As a next step I would like to create a minimal working example for the bug to reproduce. From the latest failure, one can see that the test is successfully executed inside osautoinst/test_running but a test module later, it is already showing as incomplete a few modules later and even long before the shutdown, so it shouldn't be related to shutting down the openQA inside the test (as I initially thought).

Actions #27

Updated by jbaier_cz 4 days ago

Minor improvement https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/216 for the published HDD to be reusable right away by our current tests.

Actions #28

Updated by jbaier_cz about 14 hours ago

  • Due date deleted (2024-11-29)
  • Status changed from In Progress to Workable

I tried to reuse the published image and spawn 50 inner jobs inside one test. I am struggle to reproduce the issue that way, so I will need to use another approach after HW.

Actions

Also available in: Atom PDF