action #70873
closedTest fails because auto_review:"Encoder not accepting data":retry, video is missing
Description
Observation¶
It looks like the videoencoder could not be started or got stuck before it could create the video file:
[0m[37m[2020-09-02T04:31:14.339 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.
[0m[33m[2020-09-02T04:31:14.341 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[0m[37m[2020-09-02T04:31:16.395 CEST] [debug] Passing remaining frames to the video encoder
[0m[37m[2020-09-02T04:31:16.395 CEST] [debug] Unable to pass remaining frames to video encoder: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.
[0m[37m[2020-09-02T04:31:16.395 CEST] [debug] Waiting for video encoder to finalize the video
[0m[37m[2020-09-02T04:31:16.395 CEST] [debug] The built-in video encoder (pid 7856) terminated
job reference: https://openqa.suse.de/tests/4628195
Steps to reproduce¶
- Look for references of this ticket detected on o3 and osd:
for i in o3 osd; do ssh $i "sudo -u geekotest psql openqa -c \"select jobs.id,t_finished,state,result,test,reason,host from jobs, comments, workers where t_finished >= '2020-01-01' and jobs.assigned_worker_id = workers.id and jobs.id = comments.job_id and comments.text ~ 'poo#70873' order by t_finished desc limit 20;\""; done
Suggestions¶
- DONE:
Let's see whether it happens more often. Maybe the error handling when starting the video encoder could be improved.yes, happened multiple times on both o3 and osd - Check our test code coverage for the code in question, e.g. backend/baseclass.pm line 157
- Try to simulate failures in local tests, e.g. kill video encoder while a test is running
- Improve the error handling, e.g. extend the die message to inform about if the corresponding process is still there, etc.
Updated by okurz over 4 years ago
- Related to action #67342: Support using a "generic" video encoder like ffmpeg instead of only relying on the one provided by os-autoinst itself added
Updated by okurz over 4 years ago
- Related to coordination #68923: [epic] Use external videoencoder in production auto_review:"External encoder not accepting data" added
Updated by okurz over 4 years ago
- Category set to Regressions/Crashes
- Target version set to Ready
I added the two related tickets regarding "external encoder" not assuming that any external encoder is used but that related code changes could have caused this.
Updated by okurz over 4 years ago
- Subject changed from Test fails because auto_review:"Encoder not accepting data", video is missing to Test fails because auto_review:"Encoder not accepting data":retry, video is missing
retriggered jobs seem fine, e.g. https://openqa.suse.de/admin/workers/669 and https://openqa.suse.de/tests/4629502 are ok
Updated by mkittler over 4 years ago
but that related code changes could have caused this
It looks more like the video encoder couldn't be started in the first place or failed very early after startup. Unfortunately we don't have good error handling when starting the video encoder.
Note that my changes for the external video encoder mainly did more of the same but did not change the way the video encoder is launched. And yes, the final "Unable to pass remaining frames to video encoder" messages comes from my code but it is just logged after the stack has already started to terminate due to the previous "Encoder not accepting data" error (which is recorded as reason).
Updated by okurz about 4 years ago
- Description updated (diff)
- Status changed from New to Feedback
- Assignee set to okurz
I conducted the DB search that I added to the ticket description to find further occurrences and found
on o3 and osd (in this order):
id | t_finished | state | result | test | reason | host
---------+---------------------+-------+------------+-------+-------------------------------------------------------------------------------------------------+---------------
1409732 | 2020-09-28 01:12:01 | done | incomplete | gnome | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker4
(1 row)
id | t_finished | state | result | test | reason | host
---------+---------------------+-------+------------+-----------------------------------------------+-------------------------------------------------------------------------------------------------+---------------
4744792 | 2020-09-29 00:20:34 | done | incomplete | mru-iscsi_client_normal_auth_backstore_fileio | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker6
4742850 | 2020-09-28 00:29:22 | done | incomplete | qam-rsync-client | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
4628195 | 2020-09-02 02:31:20 | done | incomplete | mau-qa_kernel_lynis | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
(3 rows)
One example is https://openqa.opensuse.org/tests/1409732 with log details:
[0m[2020-09-28T03:11:04.065 CEST] [debug] tests/installation/bootloader.pm:57 called bootloader_setup::bootmenu_default_params -> lib/bootloader_setup.pm:478 called bootloader_setup::type_boot_parameters -> lib/bootloader_setup.pm:1330 called testapi::type_string
[2020-09-28T03:11:04.065 CEST] [debug] <<< testapi::type_string(string=" Y2DEBUG=1 vga=791 video=1024x768 plymouth.ignore-serial-consoles console=ttyS0 console=tty linuxrc.log=/dev/ttyS0 linuxrc.core=/dev/ttyS0 linuxrc.debug=4,trace reboot_timeout=0 ", max_interval=4, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[37m[2020-09-28T03:11:56.918 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.
Notice the timestamp difference between mm:ss 11:04.1 and mm:ss 11:56.9 .
The relevant code in backend/baseclass.pm is:
sub _write_buffered_data_to_file_handle {
my ($self, $program_name, $array_of_buffers, $fh) = @_;
# write as much data as possible (this is called when $fh is ready to write)
my $data = shift @$array_of_buffers;
my $data_written = $fh->syswrite($data);
die "$program_name not accepting data" unless defined $data_written;
# put remaining data it back into the queue
unshift @$array_of_buffers, substr($data, $data_written) unless $data_written == length($data);
What we could do is check the status of the videoencoder process, e.g. output the PID and check if the process is still running.
Note that "Encoder" means the internal encoder, different than "External encoder" which we would see if any custom configurable video encoder would fail here.
I found out that syswrite sets the errno variable in case of errors, see https://perldoc.perl.org/functions/syswrite . So we should output this to have more details in the case of problems:
https://github.com/os-autoinst/os-autoinst/pull/1550
Updated by okurz about 4 years ago
- Due date set to 2020-11-14
- Priority changed from Normal to Low
PR merged. I guess we should wait couple of weeks and check if we see the error recorded. Then we might have more information.
Updated by okurz about 4 years ago
- Status changed from Feedback to Resolved
$ for i in o3 osd; do ssh $i "sudo -u geekotest psql openqa -c \"select jobs.id,t_finished,state,result,test,reason,host from jobs, comments, workers where t_finished >= '2020-01-01' and jobs.assigned_worker_id = workers.id and jobs.id = comments.job_id and comments.text ~ 'poo#70873' order by t_finished desc limit 20;\""; done
id | t_finished | state | result | test | reason | host
---------+---------------------+-------+------------+-------+-------------------------------------------------------------------------------------------------+---------------
1409732 | 2020-09-28 01:12:01 | done | incomplete | gnome | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker4
(1 row)
id | t_finished | state | result | test | reason | host
---------+---------------------+-------+------------+-----------------------------------------------+-------------------------------------------------------------------------------------------------+---------------
4744792 | 2020-09-29 00:20:34 | done | incomplete | mru-iscsi_client_normal_auth_backstore_fileio | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker6
4742850 | 2020-09-28 00:29:22 | done | incomplete | qam-rsync-client | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
4628195 | 2020-09-02 02:31:20 | done | incomplete | mau-qa_kernel_lynis | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
(3 rows)
so no more occurences since more than a month
Updated by okurz about 4 years ago
- Copied to action #81002: Test incompletes with auto_review:"(?s)openqa.suse.de.*Encoder not accepting data.*Uploading video":retry but video is there added