action #103518
closedopencv update broke os-autoinst (was: Tests on Raspberry Pi 2/3/4 are broken)
0%
Description
Tests on Raspberry Pi 2/3/4 are broken since last Tumbleweed host update.
Log:
GOT GO
[2021-12-06T09:04:37.319492+01:00] [debug] Snapshots are not supported
[2021-12-06T09:04:37.336061+01:00] [debug] ||| starting prepare_firstboot tests/jeos/prepare_firstboot.pm
[2021-12-06T09:04:37.339090+01:00] [debug] tests/jeos/prepare_firstboot.pm:32 called testapi::select_console
[2021-12-06T09:04:37.339723+01:00] [debug] <<< testapi::select_console(testapi_console="root-ssh")
[2021-12-06T09:04:37.355665+01:00] [debug] Establishing VNC connection to localhost:33181
[2021-12-06T09:04:37.586016+01:00] [debug] Connected to Xvnc - PID 10542
icewm PID is 10546
[2021-12-06T09:04:38.605791+01:00] [debug] Wait for SSH on host 192.168.0.56 (timeout: 240)
xterm PID is 11012
[2021-12-06T09:06:52.797570+01:00] [debug] <<< backend::baseclass::start_ssh_serial(hostname="192.168.0.56", password="SECRET", username="root")
[2021-12-06T09:06:52.798257+01:00] [debug] <<< backend::baseclass::new_ssh_connection(password="SECRET", hostname="192.168.0.56", username="root")
[2021-12-06T09:06:53.048026+01:00] [debug] SSH connection to root@192.168.0.56 established
[2021-12-06T09:06:54.530075+01:00] [debug] ssh xterm vt: grabbing serial console
[2021-12-06T09:06:54.585865+01:00] [debug] led state 0 0 0 -261
[2021-12-06T09:06:54.635047+01:00] [debug] activate_console, console: root-ssh, type: ssh
[2021-12-06T09:06:54.635678+01:00] [debug] tests/jeos/prepare_firstboot.pm:32 called testapi::select_console -> lib/susedistribution.pm:807 called susedistribution::handle_password_prompt -> lib/susedistribution.pm:48 called testapi::assert_screen
[2021-12-06T09:06:54.636039+01:00] [debug] <<< testapi::assert_screen(mustmatch="password-prompt", timeout=60)
X connection to :33181 broken (explicit kill or server shutdown).
[2021-12-06T09:06:54.695511+01:00] [debug] backend process exited: 0
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":33181"
[2021-12-06T09:06:54.698685+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 10546 and exit status: 1
[2021-12-06T09:06:54.699638+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 11014 and exit status: 0
[2021-12-06T09:06:54.699977+01:00] [debug] stopping command server 7816 because test execution ended
[2021-12-06T09:06:54.700242+01:00] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20033/XhYJpnZ3EKa3Iiaf/broadcast
[2021-12-06T09:06:54.701610+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 11012 and exit status: 84
[2021-12-06T09:06:54.708448+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 10542 and exit status: 0
[2021-12-06T09:06:55.067383+01:00] [info] ::: backend::driver::__ANON__: Driver backend collected unknown process with pid 7825 and exit status: 0
[2021-12-06T09:06:55.671518+01:00] [debug] commands process exited: 0
[2021-12-06T09:06:55.772073+01:00] [debug] done with command server
[2021-12-06T09:06:55.772351+01:00] [debug] stopping autotest process 7819
[2021-12-06T09:06:55.773228+01:00] [debug] autotest received signal TERM, saving results of current test before exiting
[2021-12-06T09:06:55.783371+01:00] [debug] [autotest] process exited: 1
[2021-12-06T09:06:55.883881+01:00] [debug] done with autotest process
[2021-12-06T09:06:55.884093+01:00] [debug] isotovideo failed
[2021-12-06T09:06:55.885336+01:00] [debug] stopping backend process 7820
[2021-12-06T09:06:55.885547+01:00] [debug] done with backend process
7812: EXIT 1
[2021-12-06T09:06:56.026552+01:00] [info] Isotovideo exit status: 1
[2021-12-06T09:06:56.160740+01:00] [info] +++ worker notes +++
[2021-12-06T09:06:56.161131+01:00] [info] End time: 2021-12-06 08:06:56
[2021-12-06T09:06:56.161325+01:00] [info] Result: died
Updated by okurz almost 3 years ago
- Status changed from New to Feedback
- Assignee set to okurz
- Target version set to Ready
Could be related to https://github.com/os-autoinst/os-autoinst/pull/1868 but I don't see any obvious error message.
The first bad job in this scenario in https://openqa.opensuse.org/tests/2069354/logfile?filename=autoinst-log.txt shows "Current version is 4.6.1638454781.7b07525b [interface v24]", last good https://openqa.opensuse.org/tests/2066263/logfile?filename=autoinst-log.txt has" Current version is 4.6.1638007345.ae6eed2a [interface v24]" . https://github.com/os-autoinst/os-autoinst/compare/ae6eed2a...7b07525b shows 15 commits. I suspect that either https://github.com/os-autoinst/os-autoinst/commit/68625004e128b5565353fcbf5ff366bb271ebb74 or https://github.com/os-autoinst/os-autoinst/commit/b1bfd883c185034f1c48385636ba8cef9bb17f24 cause problems here. Could you try to go back to an os-autoinst version before the mentioned pull request? As an alternative you might be able to give me access?
Updated by ggardet_arm almost 3 years ago
Trying with os-autoinst from aarch64 Tumbleweed OSS repo: 4.6.1638289529.0a3f5b98-1.1
at https://openqa.opensuse.org/tests/2072532
Updated by ggardet_arm almost 3 years ago
ggardet_arm wrote:
Trying with os-autoinst from aarch64 Tumbleweed OSS repo: 4.6.1638289529.0a3f5b98-1.1
at https://openqa.opensuse.org/tests/2072532
This version indeed works! What do you want I try to debug this further?
Updated by okurz almost 3 years ago
- I wonder why we don't see error messages that point to the real problem. I would expect something like perl warnings like "Too many arguments for subroutine $foo" or similar. You might be able to see more when running tests on that machine manually with
isotovideo
. Could you put the vars.json file of one job, e.g. https://openqa.opensuse.org/tests/2072487 into a local directory on that worker host, provide assets there locally e.g. symlink "openSUSE-Tumbleweed-ARM-JeOS-raspberrypi2.armv7l-2021.12.02-Snapshot20211202.raw.xz" from the worker cache into /tmp and run isotovideo from that directory. Then you might see more. - Do you know which consoles are being used? Only some files have been changed in https://github.com/os-autoinst/os-autoinst/commit/b1bfd883c185034f1c48385636ba8cef9bb17f24
- As the changes done in the mentioned commits are only perl code changes you can try to manually patch changes and bisect this way
Updated by ggardet_arm almost 3 years ago
https://build.opensuse.org/package/show/devel:openQA/os-autoinst shows errors :
[ 141s] The following tests FAILED:
[ 141s] 3 - test-perl-testsuite (Failed)
Maybe related?
With latest openCV, we now hit the following error:
[2021-12-08T10:00:42.317288+01:00] [debug] tests/jeos/prepare_firstboot.pm:32 called testapi::select_console -> lib/susedistribution.pm:807 called susedistribution::handle_password_prompt -> lib/susedistribution.pm:48 called testapi::assert_screen
[2021-12-08T10:00:42.317662+01:00] [debug] <<< testapi::assert_screen(mustmatch="password-prompt", timeout=60)
terminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.5.4) /home/abuild/rpmbuild/BUILD/opencv-4.5.4/modules/imgproc/src/smooth.dispatch.cpp:293: error: (-215:Assertion failed) ksize.width > 0 && ksize.width % 2 == 1 && ksize.height > 0 && ksize.height % 2 == 1 in function 'createGaussianKernels'
Unexpected end of data 0
X connection to :37707 broken (explicit kill or server shutdown).
[2021-12-08T10:00:42.362835+01:00] [debug] backend process exited: 0
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":37707"
See: https://openqa.opensuse.org/tests/2076217
I am still with os-autoinst
from aarch64 Tumbleweed OSS repo: 4.6.1638289529.0a3f5b98-1.1
Updated by favogt almost 3 years ago
- Status changed from Feedback to Workable
I can reproduce the crash on x86_64 Tumbleweed as well, after upgrading OpenCV 4.5.2-2.2 -> 4.5.4-1.1.
With GDB, it looks like the ABI got broken and inside OpenCV the value of ksize
is corrupt: $2 = {width = 0xe87a3e08, height = 0x7fff}
Updated by favogt almost 3 years ago
The ABI breakage is because class Size
is trivially copyable in the new version, so it can be passed in a register ($rdx
). In the older version it's not the case, and it's pushed to the stack instead. So tinycv
writes 0x30000003
to the stack, but the OpenCV library reads $rdx
again, which contains random stuff.
According to upstream, OpenCV 4.x does not provide ABI compatibility: https://github.com/opencv/opencv/issues/20878
https://build.opensuse.org/request/show/936484 will enforce rebuilds of packages using OpenCV on every version change.
Until that is merged, we'll have to ensure that os-autoinst got rebuilt.
Updated by ggardet_arm almost 3 years ago
Thanks for all the details and tests Fabian!
Updated by ggardet_arm almost 3 years ago
- Status changed from Workable to Resolved
I tested os-autoinst 4.6.1638540755.a348c6d8
built against latest opencv and it works. So, we should be good now.
Updated by ggardet_arm almost 3 years ago
- Subject changed from Tests on Raspberry Pi 2/3/4 are broken to opencv update broke os-autoinst (was: Tests on Raspberry Pi 2/3/4 are broken)