Project

General

Profile

action #103518

opencv update broke os-autoinst (was: Tests on Raspberry Pi 2/3/4 are broken)

Added by ggardet_arm 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2021-12-06
Due date:
% Done:

0%

Estimated time:

Description

Tests on Raspberry Pi 2/3/4 are broken since last Tumbleweed host update.

Log:

GOT GO

[2021-12-06T09:04:37.319492+01:00] [debug] Snapshots are not supported
[2021-12-06T09:04:37.336061+01:00] [debug] ||| starting prepare_firstboot tests/jeos/prepare_firstboot.pm
[2021-12-06T09:04:37.339090+01:00] [debug] tests/jeos/prepare_firstboot.pm:32 called testapi::select_console
[2021-12-06T09:04:37.339723+01:00] [debug] <<< testapi::select_console(testapi_console="root-ssh")
[2021-12-06T09:04:37.355665+01:00] [debug] Establishing VNC connection to localhost:33181
[2021-12-06T09:04:37.586016+01:00] [debug] Connected to Xvnc - PID 10542
icewm PID is 10546
[2021-12-06T09:04:38.605791+01:00] [debug] Wait for SSH on host 192.168.0.56 (timeout: 240)
xterm PID is 11012
[2021-12-06T09:06:52.797570+01:00] [debug] <<< backend::baseclass::start_ssh_serial(hostname="192.168.0.56", password="SECRET", username="root")
[2021-12-06T09:06:52.798257+01:00] [debug] <<< backend::baseclass::new_ssh_connection(password="SECRET", hostname="192.168.0.56", username="root")
[2021-12-06T09:06:53.048026+01:00] [debug] SSH connection to root@192.168.0.56 established
[2021-12-06T09:06:54.530075+01:00] [debug] ssh xterm vt: grabbing serial console
[2021-12-06T09:06:54.585865+01:00] [debug] led state 0 0 0 -261
[2021-12-06T09:06:54.635047+01:00] [debug] activate_console, console: root-ssh, type: ssh
[2021-12-06T09:06:54.635678+01:00] [debug] tests/jeos/prepare_firstboot.pm:32 called testapi::select_console -> lib/susedistribution.pm:807 called susedistribution::handle_password_prompt -> lib/susedistribution.pm:48 called testapi::assert_screen
[2021-12-06T09:06:54.636039+01:00] [debug] <<< testapi::assert_screen(mustmatch="password-prompt", timeout=60)
X connection to :33181 broken (explicit kill or server shutdown).
[2021-12-06T09:06:54.695511+01:00] [debug] backend process exited: 0
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":33181"
[2021-12-06T09:06:54.698685+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 10546 and exit status: 1
[2021-12-06T09:06:54.699638+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 11014 and exit status: 0
[2021-12-06T09:06:54.699977+01:00] [debug] stopping command server 7816 because test execution ended
[2021-12-06T09:06:54.700242+01:00] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20033/XhYJpnZ3EKa3Iiaf/broadcast
[2021-12-06T09:06:54.701610+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 11012 and exit status: 84
[2021-12-06T09:06:54.708448+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 10542 and exit status: 0
[2021-12-06T09:06:55.067383+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 7825 and exit status: 0
[2021-12-06T09:06:55.671518+01:00] [debug] commands process exited: 0
[2021-12-06T09:06:55.772073+01:00] [debug] done with command server
[2021-12-06T09:06:55.772351+01:00] [debug] stopping autotest process 7819
[2021-12-06T09:06:55.773228+01:00] [debug] autotest received signal TERM, saving results of current test before exiting
[2021-12-06T09:06:55.783371+01:00] [debug] [autotest] process exited: 1
[2021-12-06T09:06:55.883881+01:00] [debug] done with autotest process
[2021-12-06T09:06:55.884093+01:00] [debug] isotovideo failed
[2021-12-06T09:06:55.885336+01:00] [debug] stopping backend process 7820
[2021-12-06T09:06:55.885547+01:00] [debug] done with backend process
7812: EXIT 1
[2021-12-06T09:06:56.026552+01:00] [info] Isotovideo exit status: 1
[2021-12-06T09:06:56.160740+01:00] [info] +++ worker notes +++
[2021-12-06T09:06:56.161131+01:00] [info] End time: 2021-12-06 08:06:56
[2021-12-06T09:06:56.161325+01:00] [info] Result: died

See: https://openqa.opensuse.org/tests/2072487

History

#1 Updated by okurz 6 months ago

  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Ready

Could be related to https://github.com/os-autoinst/os-autoinst/pull/1868 but I don't see any obvious error message.

The first bad job in this scenario in https://openqa.opensuse.org/tests/2069354/logfile?filename=autoinst-log.txt shows "Current version is 4.6.1638454781.7b07525b [interface v24]", last good https://openqa.opensuse.org/tests/2066263/logfile?filename=autoinst-log.txt has" Current version is 4.6.1638007345.ae6eed2a [interface v24]" . https://github.com/os-autoinst/os-autoinst/compare/ae6eed2a...7b07525b shows 15 commits. I suspect that either https://github.com/os-autoinst/os-autoinst/commit/68625004e128b5565353fcbf5ff366bb271ebb74 or https://github.com/os-autoinst/os-autoinst/commit/b1bfd883c185034f1c48385636ba8cef9bb17f24 cause problems here. Could you try to go back to an os-autoinst version before the mentioned pull request? As an alternative you might be able to give me access?

#2 Updated by ggardet_arm 6 months ago

Trying with os-autoinst from aarch64 Tumbleweed OSS repo: 4.6.1638289529.0a3f5b98-1.1
at https://openqa.opensuse.org/tests/2072532

#3 Updated by ggardet_arm 6 months ago

ggardet_arm wrote:

Trying with os-autoinst from aarch64 Tumbleweed OSS repo: 4.6.1638289529.0a3f5b98-1.1
at https://openqa.opensuse.org/tests/2072532

This version indeed works! What do you want I try to debug this further?

#4 Updated by okurz 6 months ago

  • I wonder why we don't see error messages that point to the real problem. I would expect something like perl warnings like "Too many arguments for subroutine $foo" or similar. You might be able to see more when running tests on that machine manually with isotovideo. Could you put the vars.json file of one job, e.g. https://openqa.opensuse.org/tests/2072487 into a local directory on that worker host, provide assets there locally e.g. symlink "openSUSE-Tumbleweed-ARM-JeOS-raspberrypi2.armv7l-2021.12.02-Snapshot20211202.raw.xz" from the worker cache into /tmp and run isotovideo from that directory. Then you might see more.
  • Do you know which consoles are being used? Only some files have been changed in https://github.com/os-autoinst/os-autoinst/commit/b1bfd883c185034f1c48385636ba8cef9bb17f24
  • As the changes done in the mentioned commits are only perl code changes you can try to manually patch changes and bisect this way

#5 Updated by ggardet_arm 5 months ago

https://build.opensuse.org/package/show/devel:openQA/os-autoinst shows errors :

[  141s] The following tests FAILED:
[  141s]      3 - test-perl-testsuite (Failed)

Maybe related?

With latest openCV, we now hit the following error:

[2021-12-08T10:00:42.317288+01:00] [debug] tests/jeos/prepare_firstboot.pm:32 called testapi::select_console -> lib/susedistribution.pm:807 called susedistribution::handle_password_prompt -> lib/susedistribution.pm:48 called testapi::assert_screen
[2021-12-08T10:00:42.317662+01:00] [debug] <<< testapi::assert_screen(mustmatch="password-prompt", timeout=60)
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.5.4) /home/abuild/rpmbuild/BUILD/opencv-4.5.4/modules/imgproc/src/smooth.dispatch.cpp:293: error: (-215:Assertion failed) ksize.width > 0 && ksize.width % 2 == 1 && ksize.height > 0 && ksize.height % 2 == 1 in function 'createGaussianKernels'

Unexpected end of data 0
X connection to :37707 broken (explicit kill or server shutdown).
[2021-12-08T10:00:42.362835+01:00] [debug] backend process exited: 0
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":37707"

See: https://openqa.opensuse.org/tests/2076217

I am still with os-autoinst from aarch64 Tumbleweed OSS repo: 4.6.1638289529.0a3f5b98-1.1

#6 Updated by favogt 5 months ago

  • Status changed from Feedback to Workable

I can reproduce the crash on x86_64 Tumbleweed as well, after upgrading OpenCV 4.5.2-2.2 -> 4.5.4-1.1.

With GDB, it looks like the ABI got broken and inside OpenCV the value of ksize is corrupt: $2 = {width = 0xe87a3e08, height = 0x7fff}

#7 Updated by favogt 5 months ago

The ABI breakage is because class Size is trivially copyable in the new version, so it can be passed in a register ($rdx). In the older version it's not the case, and it's pushed to the stack instead. So tinycv writes 0x30000003 to the stack, but the OpenCV library reads $rdx again, which contains random stuff.

According to upstream, OpenCV 4.x does not provide ABI compatibility: https://github.com/opencv/opencv/issues/20878

https://build.opensuse.org/request/show/936484 will enforce rebuilds of packages using OpenCV on every version change.

Until that is merged, we'll have to ensure that os-autoinst got rebuilt.

#8 Updated by ggardet_arm 5 months ago

Thanks for all the details and tests Fabian!

#9 Updated by ggardet_arm 5 months ago

  • Status changed from Workable to Resolved

I tested os-autoinst 4.6.1638540755.a348c6d8 built against latest opencv and it works. So, we should be good now.

#10 Updated by okurz 5 months ago

+1, thx!

#11 Updated by ggardet_arm 5 months ago

  • Subject changed from Tests on Raspberry Pi 2/3/4 are broken to opencv update broke os-autoinst (was: Tests on Raspberry Pi 2/3/4 are broken)

Also available in: Atom PDF