action #68554
closed[sporadic] some tests fail in incomplete state due to a 'cv::Exception'
0%
Description
Some tests fail in incomplete
state due to a cv::Exception
:
[0mterminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(3.4.10) /home/abuild/rpmbuild/BUILD/opencv-3.4.10/modules/imgcodecs/src/loadsave.cpp:759: error: (-215:Assertion failed) !_img.empty() in function 'imwrite'
[37m[2020-07-01T12:08:44.074 UTC] [debug] no change: 22801.1s
[0m[37m[2020-07-01T12:08:44.079 UTC] [debug] Backend process died, backend errors are reported below in the following lines:
Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.
Examples:
Seen on ip-172-25-5-39
worker (AWS A1.metal), running SLE15SP1.
Maybe we should add some checks before calling imwrite
function?
Files
Updated by mkittler over 4 years ago
Considering the reason the backend died is Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157.
this is likely the imwrite
call within videoencoder.cpp
. However, that call is only done when the last PNG is required (when someone views the running test).
The test also has the new warnings (WARNING: cv::norm() returned NaN (poo#68474)
) so it might be a consequence of working around that problem (see #68474).
Updated by mkittler over 4 years ago
- File opencv_version.txt opencv_version.txt added
I've just looked a little bit into the OpenCV code also because of the cv::norm
problem and the crashes when handling signals. The cv::norm
code is possibly OpenCL optimized and quite complicated. Since the problem only occurs on certain systems it would be interesting to know what these have in common. The tool opencv_version
can be used to determine the configuration, e.g. whether OpenCL is used. I've attached the output from my TW system where I could not reproduce the issue so far.
Updated by ggardet_arm over 4 years ago
Here is the opencv_version
log for the ip-172-25-5-39
worker (AWS A1.metal), running SLE15SP1, showing the issue.
Updated by dawei_pang over 4 years ago
I can hit the same issue during test cases continuously running: "Result: incomplete, Reason: backend died: Encoder not accepting data at /usr/lib/os-autoinst/backend/baseclass.pm line 157."
[0mterminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.3.0) /home/abuild/rpmbuild/BUILD/opencv-4.3.0/modules/imgcodecs/src/loadsave.cpp:738: error: (-215:Assertion failed) !_img.empty() in function 'imwrite'
My system is openSUSE Tumbleweed 20200628 with the latest openQA and os-autoinst version (4.6.1593615708.5a3bf43f8).
This issue almost blocks my testing.
Updated by mkittler over 4 years ago
@ggardet_arm So it looks like your version is similarly configured (and also not using OpenCL).
@dawei_pang Do the tests also fail when you don't watch them (no browser tab with the job is open)? Is this happening only on some tests (and if yes, which)? Where did you install opencv from? (I'm using currently 4.3.0-1.1
from official TW repos and haven't run into the issue.)
Updated by dawei_pang over 4 years ago
mkittler wrote:
@dawei_pang Do the tests also fail when you don't watch them (no browser tab with the job is open)? Is this happening only on some tests (and if yes, which)? Where did you install opencv from? (I'm using currently
4.3.0-1.1
from official TW repos and haven't run into the issue.)
The libopencv comes from TW repos. The test run was going to stop and fail when watch or not.
I add all test cases to a queue, the first is PASS, then the issue occurs at the beginning of 2nd cases, if the issue occurs, all of followed test cases are failure one by one.
When the issue happen, the followed information always are observed in journal log:
Jul 03 00:33:47 openQAtest kernel: perl[2006]: segfault at 514 ip 000055fc3aed89e0 sp 00007f75a3567830 error 4 in perl[55fc3ae18000+1fa000]
Jul 03 00:33:47 openQAtest kernel: Code: ee f8 33 00 e8 41 d3 f6 ff 8d 53 f9 83 e2 fb 74 05 83 fb 04 75 14 89 df 48 8b 80 d8 07 00 00 31 d2 5b 31 f6 ff e0 0f 1f 40 00 80 14 05 00 00 01 75
e3 48 8b 90 a8 05 00 00 48 85 d2 74 33 83
Since the issue is blocked my testing, I switch the system from TW to Leap15.2, but the issue is able to reproduce: libopencv3_3-3.3.1-lp152.7.9.x86_64
Updated by mkittler over 4 years ago
The test run was going to stop and fail when watch or not.
Then - despite Encoder not accepting data
- this failing imwrite
call might not even come from the videoencoder. (The videoencoder only uses this function to produce the last PNG for the live view.)
Since the issue is blocked my testing, I switch the system from TW to Leap15.2, but the issue is able to reproduce
In conclusion (taking the other reports into account as well) this issue is happening with OpenCV 3 and 4 and across different distributions (TW, Leap 15.2, SLE15SP1).
@dawei_pang Do you get the log message WARNING: cv::norm() returned NaN (poo#68474)
in the failing tests?
Updated by okurz over 4 years ago
- Project changed from openQA Tests (public) to openQA Project (public)
- Category set to Regressions/Crashes
- Priority changed from Normal to Urgent
- Target version set to Ready
Multiple persons can reproduce the problem now and also marmarek on #opensuse-factory reports what sounds related. Raising prio and moving to main openQA issue tracker.
Updated by okurz over 4 years ago
- Related to action #68474: similarity calculation can return NaN added
Updated by mkittler over 4 years ago
- Assignee set to mkittler
- Target version changed from Ready to Current Sprint
I guess the occasion where cv::imwrite
is called in videoencoder.cpp
can be safely wrapped around a check whether the image is actually present without hiding errors. Wrapping the occasion in the XS interface is likely ok, too. The function already has a return value which might be checked for errors if relevant so an empty image would just be another error. I also added logging in consistency with image_read
.
Updated by mkittler over 4 years ago
- Status changed from New to Feedback
The PR has been merged. Can you give feedback whether this fixes the issue (as I can not reproduce it locally)?
Updated by mkittler over 4 years ago
- Priority changed from Urgent to High
Seems like the issue wasn't as urgent after all since nobody has given feedback so far.
Updated by mkittler over 4 years ago
- Status changed from Feedback to Resolved
Me neither. I'm closing the issue for now. If it comes up again we can still re-open it.