Project

General

Profile

action #70873

Test fails because auto_review:"Encoder not accepting data":retry, video is missing

Added by mkittler 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2020-09-02
Due date:
2020-11-14
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

It looks like the videoencoder could not be started or got stuck before it could create the video file:

[2020-09-02T04:31:14.339 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.

[2020-09-02T04:31:14.341 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-09-02T04:31:16.395 CEST] [debug] Passing remaining frames to the video encoder
[2020-09-02T04:31:16.395 CEST] [debug] Unable to pass remaining frames to video encoder: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.

[2020-09-02T04:31:16.395 CEST] [debug] Waiting for video encoder to finalize the video
[2020-09-02T04:31:16.395 CEST] [debug] The built-in video encoder (pid 7856) terminated

job reference: https://openqa.suse.de/tests/4628195

Steps to reproduce

  • Look for references of this ticket detected on o3 and osd:
for i in o3 osd; do ssh $i "sudo -u geekotest psql openqa -c \"select jobs.id,t_finished,state,result,test,reason,host from jobs, comments, workers where t_finished >= '2020-01-01' and jobs.assigned_worker_id = workers.id and jobs.id = comments.job_id and comments.text ~ 'poo#70873' order by t_finished desc limit 20;\""; done

Suggestions

  • DONE: Let's see whether it happens more often. Maybe the error handling when starting the video encoder could be improved. yes, happened multiple times on both o3 and osd
  • Check our test code coverage for the code in question, e.g. backend/baseclass.pm line 157
  • Try to simulate failures in local tests, e.g. kill video encoder while a test is running
  • Improve the error handling, e.g. extend the die message to inform about if the corresponding process is still there, etc.

Related issues

Related to openQA Project - action #67342: Support using a "generic" video encoder like ffmpeg instead of only relying on the one provided by os-autoinst itselfResolved2020-05-27

Related to openQA Infrastructure - coordination #68923: [epic] Use external videoencoder in production auto_review:"External encoder not accepting data"Blocked2020-11-13

Copied to openQA Project - action #81002: Test incompletes with auto_review:"(?s)openqa.suse.de.*Encoder not accepting data.*Uploading video":retry but video is thereNew

History

#1 Updated by okurz 11 months ago

  • Related to action #67342: Support using a "generic" video encoder like ffmpeg instead of only relying on the one provided by os-autoinst itself added

#2 Updated by okurz 11 months ago

  • Related to coordination #68923: [epic] Use external videoencoder in production auto_review:"External encoder not accepting data" added

#3 Updated by okurz 11 months ago

  • Category set to Concrete Bugs
  • Target version set to Ready

I added the two related tickets regarding "external encoder" not assuming that any external encoder is used but that related code changes could have caused this.

#4 Updated by okurz 11 months ago

  • Subject changed from Test fails because auto_review:"Encoder not accepting data", video is missing to Test fails because auto_review:"Encoder not accepting data":retry, video is missing

#5 Updated by mkittler 11 months ago

but that related code changes could have caused this

It looks more like the video encoder couldn't be started in the first place or failed very early after startup. Unfortunately we don't have good error handling when starting the video encoder.

Note that my changes for the external video encoder mainly did more of the same but did not change the way the video encoder is launched. And yes, the final "Unable to pass remaining frames to video encoder" messages comes from my code but it is just logged after the stack has already started to terminate due to the previous "Encoder not accepting data" error (which is recorded as reason).

#6 Updated by okurz 10 months ago

  • Description updated (diff)
  • Status changed from New to Feedback
  • Assignee set to okurz

I conducted the DB search that I added to the ticket description to find further occurrences and found

on o3 and osd (in this order):

   id    |     t_finished      | state |   result   | test  |                                             reason                                              |     host      
---------+---------------------+-------+------------+-------+-------------------------------------------------------------------------------------------------+---------------
 1409732 | 2020-09-28 01:12:01 | done  | incomplete | gnome | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker4
(1 row)

   id    |     t_finished      | state |   result   |                     test                      |                                             reason                                              |     host      
---------+---------------------+-------+------------+-----------------------------------------------+-------------------------------------------------------------------------------------------------+---------------
 4744792 | 2020-09-29 00:20:34 | done  | incomplete | mru-iscsi_client_normal_auth_backstore_fileio | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker6
 4742850 | 2020-09-28 00:29:22 | done  | incomplete | qam-rsync-client                              | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
 4628195 | 2020-09-02 02:31:20 | done  | incomplete | mau-qa_kernel_lynis                           | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
(3 rows)

One example is https://openqa.opensuse.org/tests/1409732 with log details:

[2020-09-28T03:11:04.065 CEST] [debug] tests/installation/bootloader.pm:57 called bootloader_setup::bootmenu_default_params -> lib/bootloader_setup.pm:478 called bootloader_setup::type_boot_parameters -> lib/bootloader_setup.pm:1330 called testapi::type_string
[2020-09-28T03:11:04.065 CEST] [debug] <<< testapi::type_string(string="  Y2DEBUG=1 vga=791 video=1024x768 plymouth.ignore-serial-consoles console=ttyS0 console=tty linuxrc.log=/dev/ttyS0 linuxrc.core=/dev/ttyS0 linuxrc.debug=4,trace reboot_timeout=0  ", max_interval=4, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-09-28T03:11:56.918 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.

Notice the timestamp difference between mm:ss 11:04.1 and mm:ss 11:56.9 .
The relevant code in backend/baseclass.pm is:

sub _write_buffered_data_to_file_handle {
    my ($self, $program_name, $array_of_buffers, $fh) = @_;

    # write as much data as possible (this is called when $fh is ready to write)
    my $data         = shift @$array_of_buffers;
    my $data_written = $fh->syswrite($data);
    die "$program_name not accepting data" unless defined $data_written;

    # put remaining data it back into the queue
    unshift @$array_of_buffers, substr($data, $data_written) unless $data_written == length($data);

What we could do is check the status of the videoencoder process, e.g. output the PID and check if the process is still running.
Note that "Encoder" means the internal encoder, different than "External encoder" which we would see if any custom configurable video encoder would fail here.

I found out that syswrite sets the errno variable in case of errors, see https://perldoc.perl.org/functions/syswrite . So we should output this to have more details in the case of problems:
https://github.com/os-autoinst/os-autoinst/pull/1550

#7 Updated by okurz 10 months ago

  • Due date set to 2020-11-14
  • Priority changed from Normal to Low

PR merged. I guess we should wait couple of weeks and check if we see the error recorded. Then we might have more information.

#8 Updated by okurz 9 months ago

  • Status changed from Feedback to Resolved
$ for i in o3 osd; do ssh $i "sudo -u geekotest psql openqa -c \"select jobs.id,t_finished,state,result,test,reason,host from jobs, comments, workers where t_finished >= '2020-01-01' and jobs.assigned_worker_id = workers.id and jobs.id = comments.job_id and comments.text ~ 'poo#70873' order by t_finished desc limit 20;\""; done

   id    |     t_finished      | state |   result   | test  |                                             reason                                              |     host      
---------+---------------------+-------+------------+-------+-------------------------------------------------------------------------------------------------+---------------
 1409732 | 2020-09-28 01:12:01 | done  | incomplete | gnome | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker4
(1 row)

   id    |     t_finished      | state |   result   |                     test                      |                                             reason                                              |     host      
---------+---------------------+-------+------------+-----------------------------------------------+-------------------------------------------------------------------------------------------------+---------------
 4744792 | 2020-09-29 00:20:34 | done  | incomplete | mru-iscsi_client_normal_auth_backstore_fileio | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker6
 4742850 | 2020-09-28 00:29:22 | done  | incomplete | qam-rsync-client                              | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
 4628195 | 2020-09-02 02:31:20 | done  | incomplete | mau-qa_kernel_lynis                           | backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. | openqaworker5
(3 rows)

so no more occurences since more than a month

#9 Updated by okurz 8 months ago

  • Copied to action #81002: Test incompletes with auto_review:"(?s)openqa.suse.de.*Encoder not accepting data.*Uploading video":retry but video is there added

Also available in: Atom PDF