Project

General

Profile

Actions

action #123193

closed

02-test_ocr.t fails in OBS size:M

Added by tinita almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-01-16
Due date:
2023-02-01
% Done:

0%

Estimated time:

Description

Observation

The build on Tumbleweed started to fail on January 14:
https://build.opensuse.org/package/live_build_log/devel:openQA/os-autoinst/openSUSE_Factory/x86_64

[  113s] 3: [12:46:28] ./t/02-test_ocr.t .......................... 
[  113s] 3: ok 1 - log output for needle init
[  113s] 3: not ok 2 - log output for OCR
[  113s] 3: 
[  113s] 3: #   Failed test 'log output for OCR'
[  113s] 3: #   at ./t/02-test_ocr.t line 36.
[  113s] 3: # STDERR:
[  113s] 3: # Error opening data file /usr/share/tesseract-ocr/tessdata/eng.traineddata
[  113s] 3: # Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
[  113s] 3: # Failed loading language 'eng'
[  113s] 3: # Tesseract couldn't load any languages!
[  113s] 3: # Could not initialize tesseract.
[  113s] 3: # readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: # Error opening data file /usr/share/tesseract-ocr/tessdata/eng.traineddata
[  113s] 3: # Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
[  113s] 3: # Failed loading language 'eng'
[  113s] 3: # Tesseract couldn't load any languages!
[  113s] 3: # Could not initialize tesseract.
[  113s] 3: # readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: # 
[  113s] 3: # doesn't match:
[  113s] 3: # (?^u:Tesseract.*OCR)
[  113s] 3: # as expected
[  113s] 3: ok 3 - ocr match 1
[  113s] 3: not ok 4 - log output for tesseract call
[  113s] 3: 
[  113s] 3: #   Failed test 'log output for tesseract call'
[  113s] 3: #   at ./t/02-test_ocr.t line 42.
[  113s] 3: # STDERR:
[  113s] 3: # Error opening data file /usr/share/tesseract-ocr/tessdata/eng.traineddata
[  113s] 3: # Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
[  113s] 3: # Failed loading language 'eng'
[  113s] 3: # Tesseract couldn't load any languages!
[  113s] 3: # Could not initialize tesseract.
[  113s] 3: # readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: # Use of uninitialized value in concatenation (.) or string at ./t/02-test_ocr.t line 42.
[  113s] 3: # 
[  113s] 3: # doesn't match:
[  113s] 3: # (?^u:Tesseract.*OCR)
[  113s] 3: # as expected
[  113s] 3: not ok 5 - log output for tesseract call
[  113s] 3: 
[  113s] 3: #   Failed test 'log output for tesseract call'
[  113s] 3: #   at ./t/02-test_ocr.t line 42.
[  113s] 3: # STDERR:
[  113s] 3: # Error opening data file /usr/share/tesseract-ocr/tessdata/eng.traineddata
[  113s] 3: # Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
[  113s] 3: # Failed loading language 'eng'
[  113s] 3: # Tesseract couldn't load any languages!
[  113s] 3: # Could not initialize tesseract.
[  113s] 3: # readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: # Use of uninitialized value in concatenation (.) or string at ./t/02-test_ocr.t line 42.
[  113s] 3: # 
[  113s] 3: # doesn't match:
[  113s] 3: # (?^u:Tesseract.*OCR)
[  113s] 3: # as expected
[  113s] 3: ok 6 - OCR area found
[  113s] 3: not ok 7 - multiple OCR regions
[  113s] 3: 
[  113s] 3: #   Failed test 'multiple OCR regions'
[  113s] 3: #   at ./t/02-test_ocr.t line 45.
[  113s] 3: not ok 8 - no (unexpected) warnings (via done_testing)
[  113s] 3: 
[  113s] 3: #   Failed test 'no (unexpected) warnings (via done_testing)'
[  113s] 3: #   at ./t/02-test_ocr.t line 48.
[  113s] 3: # Got the following unexpected warnings:
[  113s] 3: #   1: readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: #   2: readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: #   3: readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: #   4: Use of uninitialized value in concatenation (.) or string at ./t/02-test_ocr.t line 42.
[  113s] 3: #   5: readline() on closed filehandle $fh at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d/ocr.pm line 23.
[  113s] 3: #   6: Use of uninitialized value in concatenation (.) or string at ./t/02-test_ocr.t line 42.
[  113s] 3: 1..8
[  113s] 3: # Looks like you failed 5 tests of 8.
[  114s] 3: Dubious, test returned 5 (wstat 1280, 0x500)
[  114s] 3: Failed 5/8 subtests 

Aceptance criteria

  • AC1: Test no longer fails

Suggestions

  • Wait and see if update fixes the problem
  • Otherwise debug the OCR library locally
Actions #1

Updated by osukup almost 2 years ago

location of tesseract trained data changed in x86-64 tumbleweed? or excepted location ...

abuild@quasar:~/rpmbuild/BUILD/os-autoinst-4.6.1673533640.573778d> rpm -ql tesseract-ocr-traineddata-english
/usr/share/tessdata
/usr/share/tessdata/eng.cube.bigrams
/usr/share/tessdata/eng.cube.fold
/usr/share/tessdata/eng.cube.lm
/usr/share/tessdata/eng.cube.nn
/usr/share/tessdata/eng.cube.params
/usr/share/tessdata/eng.cube.size
/usr/share/tessdata/eng.cube.word-freq
/usr/share/tessdata/eng.tesseract_cube.nn
/usr/share/tessdata/eng.traineddata
Actions #2

Updated by osukup almost 2 years ago

  • Status changed from New to In Progress
  • Assignee set to osukup
Actions #3

Updated by osukup almost 2 years ago

after passing TESSDATA_PREFIX to test:

[   93s] 3: not ok 2 - log output for OCR
[   93s] 3: 
[   93s] 3: #   Failed test 'log output for OCR'
[   93s] 3: #   at ./t/02-test_ocr.t line 36.
[   93s] 3: # STDERR:
[   93s] 3: # Warning: Parameter not found: enable_new_segsearch
[   93s] 3: # Estimating resolution as 132
[   93s] 3: # Warning: Parameter not found: enable_new_segsearch
[   93s] 3: # Estimating resolution as 138
[   93s] 3: # 
[   93s] 3: # doesn't match:
[   93s] 3: # (?^u:Tesseract.*OCR)
[   93s] 3: # as expected
[   93s] 3: ok 3 - ocr match 1
[   93s] 3: not ok 4 - log output for tesseract call
[   93s] 3: 
[   93s] 3: #   Failed test 'log output for tesseract call'
[   93s] 3: #   at ./t/02-test_ocr.t line 42.
[   93s] 3: # STDERR:
[   93s] 3: # Warning: Parameter not found: enable_new_segsearch
[   93s] 3: # Estimating resolution as 132
[   93s] 3: # 
[   93s] 3: # doesn't match:
[   93s] 3: # (?^u:Tesseract.*OCR)
[   93s] 3: # as expected
[   94s] 3: not ok 5 - log output for tesseract call
[   94s] 3: 
[   94s] 3: #   Failed test 'log output for tesseract call'
[   94s] 3: #   at ./t/02-test_ocr.t line 42.
[   94s] 3: # STDERR:
[   94s] 3: # Warning: Parameter not found: enable_new_segsearch
[   94s] 3: # Estimating resolution as 138
[   94s] 3: # 
[   94s] 3: # doesn't match:
[   94s] 3: # (?^u:Tesseract.*OCR)
[   94s] 3: # as expected
[   94s] 3: ok 6 - OCR area found
[   94s] 3: ok 7 - multiple OCR regions
[   94s] 3: ok 8 - no (unexpected) warnings (via done_testing)
[   94s] 3: 1..8
[   94s] 3: # Looks like you failed 3 tests of 8.
[   94s] 3: Dubious, test returned 3 (wstat 768, 0x300)
[   94s] 3: Failed 3/8 subtests
Actions #4

Updated by osukup almost 2 years ago

according to google this warnings appears if traineddata are damaged , when I changed data to another from upstream test passed without problems ...

Actions #6

Updated by osukup almost 2 years ago

--> so package in Publishing contains updates data, unfortuanetly it newer got into Factory.

new SR , we will see what it need fix to got into Factory -> https://build.opensuse.org/request/show/1058890

Actions #7

Updated by mkittler almost 2 years ago

I can reproduce it on my local TW system and also saw it in our normal CI jobs. Let's see whether the SR fixes it.

Actions #8

Updated by openqa_review almost 2 years ago

  • Due date set to 2023-02-01

Setting due date based on mean cycle time of SUSE QE Tools

Actions #9

Updated by osukup almost 2 years ago

SR https://build.opensuse.org/request/show/1059187 accepted to Factory, so next snapshot will have new trainingdata, after rebuild we will se if we need pass TESSDATA_PREFIX in spec

Actions #10

Updated by livdywan almost 2 years ago

  • Status changed from In Progress to Feedback

Brought it up briefly. Let's set it to Feedback for now since the SR is there, and in a few days we'll see if this works fine

Actions #11

Updated by mkittler almost 2 years ago

  • Subject changed from 02-test_ocr.t fails in OBS to 02-test_ocr.t fails in OBS size:M
  • Description updated (diff)
Actions #12

Updated by osukup almost 2 years ago

  • Status changed from Feedback to In Progress

new training data in distro, .. spec needs define TESSDATA_PREFIX but with defined prefix it still fails with:

[   92s] 3: not ok 2 - log output for OCR
[   92s] 3: 
[   92s] 3: #   Failed test 'log output for OCR'
[   92s] 3: #   at ./t/02-test_ocr.t line 36.
[   92s] 3: # STDERR:
[   92s] 3: # Estimating resolution as 132
[   92s] 3: # Estimating resolution as 138
[   92s] 3: # 
[   92s] 3: # doesn't match:
[   92s] 3: # (?^u:Tesseract.*OCR)
[   92s] 3: # as expected
[   92s] 3: ok 3 - ocr match 1
[   92s] 3: not ok 4 - log output for tesseract call
[   92s] 3: 
[   92s] 3: #   Failed test 'log output for tesseract call'
[   92s] 3: #   at ./t/02-test_ocr.t line 42.
[   92s] 3: # STDERR:
[   92s] 3: # Estimating resolution as 132
[   92s] 3: # 
[   92s] 3: # doesn't match:
[   92s] 3: # (?^u:Tesseract.*OCR)
[   92s] 3: # as expected
[   93s] 3: not ok 5 - log output for tesseract call
[   93s] 3: 
[   93s] 3: #   Failed test 'log output for tesseract call'
[   93s] 3: #   at ./t/02-test_ocr.t line 42.
[   93s] 3: # STDERR:
[   93s] 3: # Estimating resolution as 138
[   93s] 3: # 
[   93s] 3: # doesn't match:
[   93s] 3: # (?^u:Tesseract.*OCR)
[   93s] 3: # as expected

which means there is also change in behavior in tesseract-5.3.x

Actions #13

Updated by osukup almost 2 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF