Project

General

Profile

action #174259

Updated by livdywan about 3 hours ago

## Observation 

 See also #166445 

 The inner test is incomplete despite the only step simple_boot is passed. 
 It seems in some circumstances the process forked by isotovideo exits before the tests_done handling can be worked on and `testdf` closed, so isotovideo complains about the still open filehandle. https://github.com/os-autoinst/os-autoinst/blob/master/script/isotovideo#L144 

 openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_bootstrap@64bit-2G fails in 
 [tests](https://openqa.opensuse.org/tests/4472003#step/tests/7) 

 autoinst-log.txt for the inner tests shows: 
 ``` 
 [2024-09-11T07:00:30.593688-04:00] [debug] [pid:51558] [autotest] process exited: 0 
 … 
 [2024-09-11T07:00:30.705630-04:00] [debug] [pid:51578] backend got TERM 
 [2024-09-11T07:00:30.705933-04:00] [info] [pid:51578] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json 
 [2024-09-11T07:00:31.758537-04:00] [debug] [pid:51578] Passing remaining frames to the video encoder 
 [2024-09-11T07:00:31.794569-04:00] [debug] [pid:51578] Waiting for video encoder to finalize the video 
 [2024-09-11T07:00:31.794634-04:00] [debug] [pid:51578] The built-in video encoder (pid 51580) terminated 
 [2024-09-11T07:00:31.794967-04:00] [debug] [pid:51578] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 51578 (/usr/bin/isotovideo: backend) 
 [2024-09-11T07:00:31.795581-04:00] [debug] [pid:51578] sending magic and exit 
 [2024-09-11T07:00:31.908863-04:00] [debug] [pid:51558] done with backend process 
 51558: EXIT 1 
 [2024-09-11T07:00:31.927014-04:00] [info] Isotovideo exit status: 1 
 [2024-09-11T07:00:31.965003-04:00] [info] +++ worker notes +++ 
 [2024-09-11T07:00:31.965099-04:00] [info] End time: 2024-09-11 11:00:31 
 [2024-09-11T07:00:31.965147-04:00] [info] Result: died 
 [2024-09-11T07:00:31.974998-04:00] [info] Uploading video.ogv 
 [2024-09-11T07:00:32.010558-04:00] [info] Uploading autoinst-log.txt 
 ``` 

 also from the worker 

 ``` 
 [2024-09-11T18:03:44.887585Z] [error] REST-API error (POST https://openqa.opensuse.org/api/v1/jobs/4472003/status): Connection error: Premature connection close (remaining tries: 59) 
 ``` 

 ## Steps to reproduce Reproducible 

 Fails since (at least) Build [:TW.31137](https://openqa.opensuse.org/tests/4459309) (current job). Judging from https://openqa.opensuse.org/tests/4471721#next_previous it seems the job fails the same at least 1/100 times. 

 ### Steps to reproduce with isotovideo 
 

 1. Download HDD_1 from openqa_install+publish *openqa_install+publish* test 
  
 2. Download vars.json *vars.json* 
 3. Run isotovideo until you hit the issue (~1/100 times) 

 ``` ```sh 
 wget https://openqa.opensuse.org/tests/4699317/asset/hdd/opensuse-Tumbleweed-x86_64-20241210-minimalx@64bit.qcow2 
 wget https://openqa.opensuse.org/tests/4699317/file/vars.json 
 for i in $(seq 1 20); do echo RUN $i; isotovideo -e CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-openQA NEEDLES_DIR=https://github.com/os-autoinst/os-autoinst-needles-openQA PRODUCTDIR=. 2>log || break; done 
 ``` 
 Also see https://progress.opensuse.org/issues/174259#note-7 

 ## Acceptance Criteria 
 * **AC1**: the scenario openqa_from_bootstrap is consistently stable with fail ratio < 0.1% with inner openQA job not incomplete Expected result 

 Last good: [:TW.31136](https://openqa.opensuse.org/tests/4458743) (or more recent) 

 ## Suggestions 
 * Run ~~Take a look into the inner example tests in a VM *DONE* didn't reproduce it job's autoinst-log.txt and also consider to upload the worker-log.txt~~ (DONE) 
 * Try to run ~~Consider compressing and uploading the openqa-in-openqa test, needing nested virt or emulating whole results directory~~ (DONE) 
 * Change ~~Look into the test code to schedule error message visible in https://openqa.opensuse.org/tests/4471721/logfile?filename=tests-journal.log.txt#line-5968~~ (DONE, this happens in successful tests as well) 
 {{collapse(tests-journal.log.txt) 
 ``` 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK*    log.c:    92: core_log_lib_info: src version: 2.1.0 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK*    log.c:    94: core_log_lib_info: compiled with support for shutdown state 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124364]: *HARK*    log.c:    97: core_log_lib_info: compiled with libndctl 63+ 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK*    log.c:    92: core_log_lib_info: src version: 2.1.0 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK*    log.c:    94: core_log_lib_info: compiled with support for shutdown state 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124373]: *HARK*    log.c:    97: core_log_lib_info: compiled with libndctl 63+ 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK*    log.c:    92: core_log_lib_info: src version: 2.1.0 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK*    log.c:    94: core_log_lib_info: compiled with support for shutdown state 
 Sep 11 10:04:20 susetest qemu-system-x86_64[124372]: *HARK*    log.c:    97: core_log_lib_info: compiled with libndctl 63+Sep 11 10:04:18 susetest worker[124339]: [info] 124339: WORKING 1 
 ``` 
 }} 
 * Make the inner error more easily reproducible, e.g. by skipping test multiple times instead of once modules and run openqa-in-openqa many times and see if it makes a difference. Is it more often reproducible? Does it only fail for ensuring that the first inner test? problem still reproduces clearly 
 * Run openqa-in-openqa test and pause Crosscheck with a different OS base, e.g. Leap vs. Tumbleweed, on multiple levels, could be OS within the inner test and schedule has an impact or qemu on the outer layer or something else 

 ## Further details 
 Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-2G&test=openqa_from_bootstrap&version=Tumbleweed) 

 * Issue can be reproduced with just `isotovideo -e` in a loop. The issue usually happens at least once in a 20 invocations. 
 * inner job's autoinst-log.txt 
 {{collapse(autoinst-log.txt) 
 ``` 
 [2024-11-11T11:57:25.407072-05:00] [debug] [pid:124618] >>> testapi::_handle_found_needle: found bootloader, similarity 1.00 @ 0/370 
 [2024-11-11T11:57:25.411049-05:00] [debug] [pid:124618] ||| finished boot tests (runtime: 11 s) 
 [2024-11-11T11:57:25.417387-05:00] [debug] [pid:124599] [autotest] process exited: 0 
 [2024-11-11T11:57:25.417909-05:00] [debug] [pid:124599] stopping command server 124615 because test manually multiple times execution ended 
 * Look [2024-11-11T11:57:25.418077-05:00] [debug] [pid:124599] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/ZaiH7djgcxpbtus1/broadcast 
 [2024-11-11T11:57:25.441790-05:00] [debug] [pid:124599] commands process exited: 0 
 [2024-11-11T11:57:25.542427-05:00] [debug] [pid:124599] done with command server 
 [2024-11-11T11:57:25.542619-05:00] [debug] [pid:124599] stopping autotest process 124618 
 [2024-11-11T11:57:25.542794-05:00] [debug] [pid:124599] done with autotest process 
 [2024-11-11T11:57:25.542924-05:00] [debug] [pid:124599] isotovideo failed 
 [2024-11-11T11:57:25.544034-05:00] [debug] [pid:124599] stopping backend process 124619 
 [2024-11-11T11:57:25.544427-05:00] [debug] [pid:124619] backend got TERM 
 [2024-11-11T11:57:25.544904-05:00] [info] [pid:124619] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json 
 [2024-11-11T11:57:26.555718-05:00] [debug] [pid:124619] Passing remaining frames to the video encoder 
 [out#0/webm @ 0x559f0307c440] video:22KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 2.759109% 
 frame=     24 fps=0.8 q=50.0 Lsize=        23KiB time=00:00:00.95 bitrate= 194.0kbits/s speed=0.031x 
 [2024-11-11T11:57:46.375712-05:00] [debug] [pid:124619] Waiting for https://github.com/os-autoinst/os-autoinst/blob/master/script/isotovideo#L151 "isotovideo failed' video encoder to finalize the video 
 [2024-11-11T11:57:46.376082-05:00] [debug] [pid:124619] The external video encoder (pid 124621) terminated 
 [2024-11-11T11:57:46.376251-05:00] [debug] [pid:124619] The built-in video encoder (pid 124622) terminated 
 [2024-11-11T11:57:46.378280-05:00] [debug] [pid:124619] QEMU: qemu-system-x86_64: terminating on signal 15 from uploaded logs from the pid 124619 (/usr/bin/isotovideo: backend) 
 [2024-11-11T11:57:46.380877-05:00] [debug] [pid:124619] sending magic and exit 
 [2024-11-11T11:57:46.545817-05:00] [debug] [pid:124599] done with backend process 
 124599: EXIT 1 
 [2024-11-11T11:57:46.613745-05:00] [info] Isotovideo exit status: 1 
 [2024-11-11T11:57:46.722113-05:00] [info] +++ worker notes +++ 
 [2024-11-11T11:57:46.722836-05:00] [info] End time: 2024-11-11 16:57:46 
 [2024-11-11T11:57:46.722981-05:00] [info] Result: died 
 [2024-11-11T11:57:46.750790-05:00] [info] Uploading video.webm 
 [2024-11-11T11:57:46.803408-05:00] [info] Uploading autoinst-log.txt 
 ``` 
 }} 

 * inner openQA job job's worker-log.txt 
 ``` 
 [2024-11-11T11:57:08.888829-05:00] [info] Preparing cgroup to detect start isotovideo 
 [2024-11-11T11:57:08.890332-05:00] [warn] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd: Permission den 
 ied at /usr/lib/perl5/vendor_perl/5.40.0/Mojo/File.pm line 85. 
 [2024-11-11T11:57:08.890537-05:00] [info] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the problem base mount with MOJO_CGROU 
 P_FS. 
 [2024-11-11T11:57:08.891118-05:00] [info] Starting isotovideo container 
 [2024-11-11T11:57:08.893714-05:00] [info] isotovideo has been started (PID: 124599) 
 [2024-11-11T11:57:08.894866-05:00] [info] 124599: WORKING 1 
 [2024-11-11T11:57:46.750369-05:00] [info] Uploading video.webm 
 [2024-11-11T11:57:46.803320-05:00] [info] Uploading autoinst-log.txt 
 [2024-11-11T11:57:46.829894-05:00] [info] Uploading worker-log.txt 
 ```

Back