Project

General

Profile

Actions

action #68554

closed

[sporadic] some tests fail in incomplete state due to a 'cv::Exception'

Added by ggardet_arm over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
-
Start date:
2020-07-01
Due date:
% Done:

0%

Estimated time:

Description

Some tests fail in incomplete state due to a cv::Exception:

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(3.4.10) /home/abuild/rpmbuild/BUILD/opencv-3.4.10/modules/imgcodecs/src/loadsave.cpp:759: error: (-215:Assertion failed) !_img.empty() in function 'imwrite'

[2020-07-01T12:08:44.074 UTC] [debug] no change: 22801.1s
[2020-07-01T12:08:44.079 UTC] [debug] Backend process died, backend errors are reported below in the following lines:
Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.

Examples:

Seen on ip-172-25-5-39 worker (AWS A1.metal), running SLE15SP1.

Maybe we should add some checks before calling imwrite function?


Files

opencv_version.txt (8.94 KB) opencv_version.txt mkittler, 2020-07-01 14:56
opencv_aarch_aws_sle15sp1.txt (8.67 KB) opencv_aarch_aws_sle15sp1.txt ggardet_arm, 2020-07-01 15:25

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #68474: similarity calculation can return NaNResolvednadvornik2020-06-26

Actions
Actions #1

Updated by mkittler over 4 years ago

Considering the reason the backend died is Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157. this is likely the imwrite call within videoencoder.cpp. However, that call is only done when the last PNG is required (when someone views the running test).

The test also has the new warnings (WARNING: cv::norm() returned NaN (poo#68474)) so it might be a consequence of working around that problem (see #68474).

Actions #2

Updated by mkittler over 4 years ago

I've just looked a little bit into the OpenCV code also because of the cv::norm problem and the crashes when handling signals. The cv::norm code is possibly OpenCL optimized and quite complicated. Since the problem only occurs on certain systems it would be interesting to know what these have in common. The tool opencv_version can be used to determine the configuration, e.g. whether OpenCL is used. I've attached the output from my TW system where I could not reproduce the issue so far.

Actions #3

Updated by ggardet_arm over 4 years ago

Here is the opencv_version log for the ip-172-25-5-39 worker (AWS A1.metal), running SLE15SP1, showing the issue.

Actions #4

Updated by dawei_pang over 4 years ago

I can hit the same issue during test cases continuously running: "Result: incomplete, Reason: backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157."

[0mterminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.3.0) /home/abuild/rpmbuild/BUILD/opencv-4.3.0/modules/imgcodecs/src/loadsave.cpp:738: error: (-215:Assertion failed) !_img.empty() in function 'imwrite'

My system is openSUSE Tumbleweed 20200628 with the latest openQA and os-autoinst version (4.6.1593615708.5a3bf43f8).

This issue almost blocks my testing.

Actions #5

Updated by mkittler over 4 years ago

@ggardet_arm So it looks like your version is similarly configured (and also not using OpenCL).

@dawei_pang Do the tests also fail when you don't watch them (no browser tab with the job is open)? Is this happening only on some tests (and if yes, which)? Where did you install opencv from? (I'm using currently 4.3.0-1.1 from official TW repos and haven't run into the issue.)

Actions #6

Updated by dawei_pang over 4 years ago

mkittler wrote:

@dawei_pang Do the tests also fail when you don't watch them (no browser tab with the job is open)? Is this happening only on some tests (and if yes, which)? Where did you install opencv from? (I'm using currently 4.3.0-1.1 from official TW repos and haven't run into the issue.)

The libopencv comes from TW repos. The test run was going to stop and fail when watch or not.
I add all test cases to a queue, the first is PASS, then the issue occurs at the beginning of 2nd cases, if the issue occurs, all of followed test cases are failure one by one.

When the issue happen, the followed information always are observed in journal log:
Jul 03 00:33:47 openQAtest kernel: perl[2006]: segfault at 514 ip 000055fc3aed89e0 sp 00007f75a3567830 error 4 in perl[55fc3ae18000+1fa000]
Jul 03 00:33:47 openQAtest kernel: Code: ee f8 33 00 e8 41 d3 f6 ff 8d 53 f9 83 e2 fb 74 05 83 fb 04 75 14 89 df 48 8b 80 d8 07 00 00 31 d2 5b 31 f6 ff e0 0f 1f 40 00 80 14 05 00 00 01 75
e3 48 8b 90 a8 05 00 00 48 85 d2 74 33 83

Since the issue is blocked my testing, I switch the system from TW to Leap15.2, but the issue is able to reproduce: libopencv3_3-3.3.1-lp152.7.9.x86_64

Actions #7

Updated by mkittler over 4 years ago

The test run was going to stop and fail when watch or not.

Then - despite Encoder not accepting data - this failing imwrite call might not even come from the videoencoder. (The videoencoder only uses this function to produce the last PNG for the live view.)

Since the issue is blocked my testing, I switch the system from TW to Leap15.2, but the issue is able to reproduce

In conclusion (taking the other reports into account as well) this issue is happening with OpenCV 3 and 4 and across different distributions (TW, Leap 15.2, SLE15SP1).


@dawei_pang Do you get the log message WARNING: cv::norm() returned NaN (poo#68474) in the failing tests?

Actions #8

Updated by okurz over 4 years ago

  • Project changed from openQA Tests (public) to openQA Project (public)
  • Category set to Regressions/Crashes
  • Priority changed from Normal to Urgent
  • Target version set to Ready

Multiple persons can reproduce the problem now and also marmarek on #opensuse-factory reports what sounds related. Raising prio and moving to main openQA issue tracker.

Actions #9

Updated by okurz over 4 years ago

  • Related to action #68474: similarity calculation can return NaN added
Actions #10

Updated by mkittler over 4 years ago

  • Assignee set to mkittler
  • Target version changed from Ready to Current Sprint

I guess the occasion where cv::imwrite is called in videoencoder.cpp can be safely wrapped around a check whether the image is actually present without hiding errors. Wrapping the occasion in the XS interface is likely ok, too. The function already has a return value which might be checked for errors if relevant so an empty image would just be another error. I also added logging in consistency with image_read.

PR: https://github.com/os-autoinst/os-autoinst/pull/1459

Actions #11

Updated by mkittler over 4 years ago

  • Status changed from New to Feedback

The PR has been merged. Can you give feedback whether this fixes the issue (as I can not reproduce it locally)?

Actions #12

Updated by mkittler over 4 years ago

  • Priority changed from Urgent to High

Seems like the issue wasn't as urgent after all since nobody has given feedback so far.

Actions #13

Updated by ggardet_arm over 4 years ago

I did not see it lately.

Actions #14

Updated by mkittler over 4 years ago

  • Status changed from Feedback to Resolved

Me neither. I'm closing the issue for now. If it comes up again we can still re-open it.

Actions #15

Updated by mkittler over 4 years ago

  • Target version deleted (Current Sprint)
Actions

Also available in: Atom PDF