action #174259
opencoordination #127031: [saga][epic] openQA for SUSE customers
coordination #165393: [epic] Improved code coverage in openQA
[sporadic] isotovideo fails complaining about still existing testfd filehandle
Description
Observation¶
See also #166445
The inner test is incomplete despite the only step simple_boot is passed.
It seems in some circumstances the process forked by isotovideo exits before the tests_done handling can be worked on and testdf
closed, so isotovideo complains about the still open filehandle. https://github.com/os-autoinst/os-autoinst/blob/master/script/isotovideo#L144
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_bootstrap@64bit-2G fails in
tests
autoinst-log.txt for the inner tests shows:
[2024-09-11T07:00:30.593688-04:00] [debug] [pid:51558] [autotest] process exited: 0
…
[2024-09-11T07:00:30.705630-04:00] [debug] [pid:51578] backend got TERM
[2024-09-11T07:00:30.705933-04:00] [info] [pid:51578] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-11T07:00:31.758537-04:00] [debug] [pid:51578] Passing remaining frames to the video encoder
[2024-09-11T07:00:31.794569-04:00] [debug] [pid:51578] Waiting for video encoder to finalize the video
[2024-09-11T07:00:31.794634-04:00] [debug] [pid:51578] The built-in video encoder (pid 51580) terminated
[2024-09-11T07:00:31.794967-04:00] [debug] [pid:51578] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 (/usr/bin/isotovideo: backend)
[2024-09-11T07:00:31.795581-04:00] [debug] [pid:51578] sending magic and exit
[2024-09-11T07:00:31.908863-04:00] [debug] [pid:51558] done with backend process
51558: EXIT 1
[2024-09-11T07:00:31.927014-04:00] [info] Isotovideo exit status: 1
[2024-09-11T07:00:31.965003-04:00] [info] +++ worker notes +++
[2024-09-11T07:00:31.965099-04:00] [info] End time: 2024-09-11 11:00:31
[2024-09-11T07:00:31.965147-04:00] [info] Result: died
[2024-09-11T07:00:31.974998-04:00] [info] Uploading video.ogv
[2024-09-11T07:00:32.010558-04:00] [info] Uploading autoinst-log.txt
also from the worker
[2024-09-11T18:03:44.887585Z] [error] REST-API error (POST https://openqa.opensuse.org/api/v1/jobs/4472003/status): Connection error: Premature connection close (remaining tries: 59)
Reproducible¶
Fails since (at least) Build :TW.31137 (current job). Judging from https://openqa.opensuse.org/tests/4471721#next_previous it seems the job fails the same at least 1/100 times.
Steps to reproduce with isotovideo¶
- Download HDD_1 from openqa_install+publish test
- Download vars.json
- Run isotovideo until you hit the issue
wget https://openqa.opensuse.org/tests/4699317/asset/hdd/opensuse-Tumbleweed-x86_64-20241210-minimalx@64bit.qcow2
wget https://openqa.opensuse.org/tests/4699317/file/vars.json
for i in $(seq 1 20); do echo RUN $i; isotovideo -e CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-openQA NEEDLES_DIR=https://github.com/os-autoinst/os-autoinst-needles-openQA PRODUCTDIR=. 2>log || break; done
Expected result¶
Last good: :TW.31136 (or more recent)
Suggestions¶
Take a look into the inner job's autoinst-log.txt and also consider to upload the worker-log.txt(DONE)Consider compressing and uploading the whole results directory(DONE)Look into the error message visible in https://openqa.opensuse.org/tests/4471721/logfile?filename=tests-journal.log.txt#line-5968(DONE, this happens in successful tests as well) tests-journal.log.txt- Make the error more easily reproducible, e.g. by skipping test modules and ensuring that the problem still reproduces clearly
- Crosscheck with a different OS base, e.g. Leap vs. Tumbleweed, on multiple levels, could be OS within the inner test has an impact or qemu on the outer layer or something else
Further details¶
Always latest result in this scenario: latest
- Issue can be reproduced with just
isotovideo -e
in a loop. The issue usually happens at least once in a 20 invocations. inner job's autoinst-log.txt
autoinst-log.txtinner job's worker-log.txt
[2024-11-11T11:57:08.888829-05:00] [info] Preparing cgroup to start isotovideo [2024-11-11T11:57:08.890332-05:00] [warn] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd: Permission den ied at /usr/lib/perl5/vendor_perl/5.40.0/Mojo/File.pm line 85. [2024-11-11T11:57:08.890537-05:00] [info] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROU P_FS. [2024-11-11T11:57:08.891118-05:00] [info] Starting isotovideo container [2024-11-11T11:57:08.893714-05:00] [info] isotovideo has been started (PID: 124599) [2024-11-11T11:57:08.894866-05:00] [info] 124599: WORKING 1 [2024-11-11T11:57:46.750369-05:00] [info] Uploading video.webm [2024-11-11T11:57:46.803320-05:00] [info] Uploading autoinst-log.txt [2024-11-11T11:57:46.829894-05:00] [info] Uploading worker-log.txt
Updated by tinita 7 days ago
- Copied from action #166445: [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:S added
Updated by tinita 7 days ago
- Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry to [sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry
Updated by jbaier_cz 6 days ago
- Description updated (diff)
I added steps to reproduce inside a distrobox (just a plain opensuse/distrobox:latest with os-autoinst-devel and openQA-devel installed).
RUN 12
Web application available at http://[::]:20183
GOT GO
3292083: EXIT 101
For some reason, I was not able to reproduce with podman and isotovideo:qemu-x86 container image.
Updated by openqa_review 5 days ago
- Due date set to 2024-12-28
Setting due date based on mean cycle time of SUSE QE Tools
Updated by tinita 2 days ago
So, running the inner example tests in my VM didn't reproduce it.
I now tried to run the openqa-in-openqa test, but nested virtualization is not enabled in the VM. I asked for help in the openplatform channel.
Additional ideas:
- Change the test code to schedule the inner test multiple times instead of once and run openqa-in-openqa many times and see if it makes a difference. Is it more often reproducible? Does it only fail for the first inner test?
- Run openqa-in-openqa test and pause the test and schedule the inner test manually multiple times
Updated by tinita 1 day ago
- Status changed from In Progress to Workable
I wanted to run openqa-in-openqa tests in that VM, but wasn't able to get nested virtualization working. I asked in the openplatform channel and got some help, but still something seems to be missing.
Setting back to workable due to absence.
Updated by livdywan about 22 hours ago
- Due date deleted (
2024-12-28)
Due date with no assignee is not meaningful, hence resetting