Project

General

Profile

Actions

action #174259

open

coordination #127031: [saga][epic] openQA for SUSE customers

coordination #165393: [epic] Improved code coverage in openQA

[sporadic] isotovideo fails complaining about still existing testfd filehandle

Added by tinita 7 days ago. Updated about 22 hours ago.

Status:
Workable
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

See also #166445

The inner test is incomplete despite the only step simple_boot is passed.
It seems in some circumstances the process forked by isotovideo exits before the tests_done handling can be worked on and testdf closed, so isotovideo complains about the still open filehandle. https://github.com/os-autoinst/os-autoinst/blob/master/script/isotovideo#L144

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_bootstrap@64bit-2G fails in
tests

autoinst-log.txt for the inner tests shows:

[2024-09-11T07:00:30.593688-04:00] [debug] [pid:51558] [autotest] process exited: 0
…
[2024-09-11T07:00:30.705630-04:00] [debug] [pid:51578] backend got TERM
[2024-09-11T07:00:30.705933-04:00] [info] [pid:51578] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2024-09-11T07:00:31.758537-04:00] [debug] [pid:51578] Passing remaining frames to the video encoder
[2024-09-11T07:00:31.794569-04:00] [debug] [pid:51578] Waiting for video encoder to finalize the video
[2024-09-11T07:00:31.794634-04:00] [debug] [pid:51578] The built-in video encoder (pid 51580) terminated
[2024-09-11T07:00:31.794967-04:00] [debug] [pid:51578] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 (/usr/bin/isotovideo: backend)
[2024-09-11T07:00:31.795581-04:00] [debug] [pid:51578] sending magic and exit
[2024-09-11T07:00:31.908863-04:00] [debug] [pid:51558] done with backend process
51558: EXIT 1
[2024-09-11T07:00:31.927014-04:00] [info] Isotovideo exit status: 1
[2024-09-11T07:00:31.965003-04:00] [info] +++ worker notes +++
[2024-09-11T07:00:31.965099-04:00] [info] End time: 2024-09-11 11:00:31
[2024-09-11T07:00:31.965147-04:00] [info] Result: died
[2024-09-11T07:00:31.974998-04:00] [info] Uploading video.ogv
[2024-09-11T07:00:32.010558-04:00] [info] Uploading autoinst-log.txt

also from the worker

[2024-09-11T18:03:44.887585Z] [error] REST-API error (POST https://openqa.opensuse.org/api/v1/jobs/4472003/status): Connection error: Premature connection close (remaining tries: 59)

Reproducible

Fails since (at least) Build :TW.31137 (current job). Judging from https://openqa.opensuse.org/tests/4471721#next_previous it seems the job fails the same at least 1/100 times.

Steps to reproduce with isotovideo

  1. Download HDD_1 from openqa_install+publish test
  2. Download vars.json
  3. Run isotovideo until you hit the issue
wget https://openqa.opensuse.org/tests/4699317/asset/hdd/opensuse-Tumbleweed-x86_64-20241210-minimalx@64bit.qcow2
wget https://openqa.opensuse.org/tests/4699317/file/vars.json
for i in $(seq 1 20); do echo RUN $i; isotovideo -e CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-openQA NEEDLES_DIR=https://github.com/os-autoinst/os-autoinst-needles-openQA PRODUCTDIR=. 2>log || break; done

Expected result

Last good: :TW.31136 (or more recent)

Suggestions

  • Take a look into the inner job's autoinst-log.txt and also consider to upload the worker-log.txt (DONE)
  • Consider compressing and uploading the whole results directory (DONE)
  • Look into the error message visible in https://openqa.opensuse.org/tests/4471721/logfile?filename=tests-journal.log.txt#line-5968 (DONE, this happens in successful tests as well) tests-journal.log.txt
  • Make the error more easily reproducible, e.g. by skipping test modules and ensuring that the problem still reproduces clearly
  • Crosscheck with a different OS base, e.g. Leap vs. Tumbleweed, on multiple levels, could be OS within the inner test has an impact or qemu on the outer layer or something else

Further details

Always latest result in this scenario: latest

  • Issue can be reproduced with just isotovideo -e in a loop. The issue usually happens at least once in a 20 invocations.
  • inner job's autoinst-log.txt
    autoinst-log.txt

  • inner job's worker-log.txt

    [2024-11-11T11:57:08.888829-05:00] [info] Preparing cgroup to start isotovideo
    [2024-11-11T11:57:08.890332-05:00] [warn] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd: Permission den
    ied at /usr/lib/perl5/vendor_perl/5.40.0/Mojo/File.pm line 85.
    [2024-11-11T11:57:08.890537-05:00] [info] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROU
    P_FS.
    [2024-11-11T11:57:08.891118-05:00] [info] Starting isotovideo container
    [2024-11-11T11:57:08.893714-05:00] [info] isotovideo has been started (PID: 124599)
    [2024-11-11T11:57:08.894866-05:00] [info] 124599: WORKING 1
    [2024-11-11T11:57:46.750369-05:00] [info] Uploading video.webm
    [2024-11-11T11:57:46.803320-05:00] [info] Uploading autoinst-log.txt
    [2024-11-11T11:57:46.829894-05:00] [info] Uploading worker-log.txt
    

Related issues 1 (1 open0 closed)

Copied from openQA Tests (public) - action #166445: [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:SWorkable2024-09-06

Actions
Actions #1

Updated by tinita 7 days ago

  • Copied from action #166445: [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry size:S added
Actions #2

Updated by tinita 7 days ago

  • Subject changed from [openQA-in-openQA][sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry to [sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry
Actions #3

Updated by tinita 7 days ago

  • Subject changed from [sporadic] test fails in tests, simple_boot incomplete auto_review:"no candidate needle.*openqa-test-details.*matched":retry to [sporadic] isotovideo fails complaining about still existing testfd filehandle
Actions #4

Updated by okurz 7 days ago

  • Due date deleted (2024-12-26)
  • Start date deleted (2024-09-06)

@tinita please use steps to reproduce that don't involve openQA-in-openQA@o3, i.e. just isotovideo as jbaier_cz explained

Actions #5

Updated by tinita 7 days ago

  • Assignee set to tinita
Actions #6

Updated by tinita 6 days ago

  • Status changed from New to In Progress
Actions #7

Updated by jbaier_cz 6 days ago

  • Description updated (diff)

I added steps to reproduce inside a distrobox (just a plain opensuse/distrobox:latest with os-autoinst-devel and openQA-devel installed).

RUN 12
Web application available at http://[::]:20183
GOT GO

3292083: EXIT 101

For some reason, I was not able to reproduce with podman and isotovideo:qemu-x86 container image.

Actions #8

Updated by openqa_review 5 days ago

  • Due date set to 2024-12-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions #9

Updated by tinita 3 days ago

I'm still installing openQA to the test vm.
I am apparently missing steps. I am going with os-autoinst-distri-openQA/tests/install/* but still some packages are missing.
The test fails with no kvm-img/qemu-img found

Actions #10

Updated by tinita 3 days ago

I'm running 500 distri-example tests in the VM. So far no failure.

Actions #11

Updated by tinita 2 days ago

So, running the inner example tests in my VM didn't reproduce it.
I now tried to run the openqa-in-openqa test, but nested virtualization is not enabled in the VM. I asked for help in the openplatform channel.

Additional ideas:

  • Change the test code to schedule the inner test multiple times instead of once and run openqa-in-openqa many times and see if it makes a difference. Is it more often reproducible? Does it only fail for the first inner test?
  • Run openqa-in-openqa test and pause the test and schedule the inner test manually multiple times
Actions #12

Updated by tinita 1 day ago

  • Status changed from In Progress to Workable

I wanted to run openqa-in-openqa tests in that VM, but wasn't able to get nested virtualization working. I asked in the openplatform channel and got some help, but still something seems to be missing.
Setting back to workable due to absence.

Actions #13

Updated by livdywan about 22 hours ago

  • Due date deleted (2024-12-28)

Due date with no assignee is not meaningful, hence resetting

Actions

Also available in: Atom PDF