Project

General

Profile

Actions

action #117352

closed

OBS build fails in t/29-backend-generalhw.t size:M

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-09-29
Due date:
2022-10-20
% Done:

0%

Estimated time:

Description

Observation

Visit https://build.opensuse.org/package/live_build_log/devel:openQA/os-autoinst/openSUSE_Leap_15.4/aarch64

Package devel:openQA/os-autoinst failed to build in openSUSE_Leap_15.4/aarch64

Check out the package for editing:
osc checkout devel:openQA os-autoinst

Last lines of build log:

[  556s] 3: [20:58:54] ./t/29-backend-generalhw.t ................. 
[  556s] 3: ok 1 - can check socket
[  556s] 3: # Subtest: start VM
[  556s] 3:     ok 1 - return value
[  556s] 3:     ok 2 - poweroff/on commands invoked
[  556s] 3:     ok 3 - tried to connect to VNC server
[  556s] 3:     1..3
[  556s] 3: ok 2 - start VM
[  556s] 3: # Subtest: start VM with video
[  556s] 3:     ok 1 - return value
[  556s] 3:     ok 2 - poweroff/on commands invoked
[  556s] 3:     ok 3 - tried to connect to video stream
[  556s] 3:     1..3
[  556s] 3: ok 3 - start VM with video
[  556s] 3: # Subtest: hdd args
[  556s] 3:     ok 1 - return value
[  556s] 3:     1..1
[  556s] 3: ok 4 - hdd args
[  556s] 3: # Subtest: stop VM
[  556s] 3:     ok 1 - return value
[  556s] 3:     ok 2 - poweroff/on commands invoked
[  556s] 3:     1..2
[  556s] 3: ok 5 - stop VM
[  556s] 3: # Subtest: error handling
[  556s] 3:     ok 1 - IPC error thrown with context
[  556s] 3:     ok 2 - error when GENERAL_HW_CMD_DIR is not a directory
[  556s] 3:     ok 3 - WORKER_HOSTNAME required
[  556s] 3:     1..3
[  556s] 3: ok 6 - error handling
[  556s] 3: # Subtest: handling power commands
[  556s] 3:     ok 1 - power commands invoked
[  556s] 3:     ok 2 - dies on invalid action
[  556s] 3:     1..2
[  556s] 3: ok 7 - handling power commands
[  556s] 3: # Subtest: re-login VNC
[  556s] 3:     ok 1 - re-login has truthy return code
[  556s] 3:     ok 2 - VNC base console assigned
[  556s] 3:     ok 3 - previously assigned VNC socket closed
[  556s] 3:     1..3
[  556s] 3: ok 8 - re-login VNC
[  556s] 3: # Subtest: serial grab
[  556s] 3:     # Subtest: capturing output
[  556s] 3:         ok 1 - serial PID assigned: 4324
[  556s] 3:         ok 2 - serial output captured
[  556s] 3:         1..2
[  556s] 3:     ok 1 - capturing output
[  556s] 3:     # Subtest: stop grabbing
[  556s] 3:         1..0
[  556s] 3:     not ok 2 - No tests run for subtest "stop grabbing"
[  556s] 3:         # No tests run!
[  556s] 3: 
[  556s] 3:     #   Failed test 'No tests run for subtest "stop grabbing"'
[  556s] 3:     #   at ./t/29-backend-generalhw.t line 174.
[  556s] 3:     1..2
[  556s] 3: not ok 9 - serial grab
[  556s] 3: ok 10 - no (unexpected) warnings (via END block)
[  556s] 3:     # Looks like you failed 1 test of 2.
[  556s] 3: 
[  556s] 3: #   Failed test 'serial grab'
[  556s] 3: #   at ./t/29-backend-generalhw.t line 177.
[  556s] 3: Can't kill('-TERM', '4325'): No such process at /home/abuild/rpmbuild/BUILD/os-autoinst-4.6.1664377866.1f6d57e/backend/generalhw.pm line 186
[  556s] 3: # Tests were run but no plan was declared and done_testing() was not seen.
[  556s] 3: # Looks like your test exited with 255 just after 10.
[  556s] 3: Dubious, test returned 255 (wstat 65280, 0xff00)
[  556s] 3: Failed 1/10 subtests 

...

[  608s] 3: 
[  608s] 3: Test Summary Report
[  608s] 3: -------------------
[  608s] 3: ./t/29-backend-generalhw.t               (Wstat: 65280 Tests: 10 Failed: 1)
[  608s] 3:   Failed test:  9
[  608s] 3:   Non-zero exit status: 255
[  608s] 3:   Parse errors: No plan found in TAP output
[  608s] 3: Files=59, Tests=1266, 373 wallclock secs ( 1.73 usr  0.82 sys + 319.81 cusr 51.16 csys = 373.52 CPU)
[  608s] 3: Result: FAIL
[29409s] qemu-system-aarch64: terminating on signal 15 from pid 27497 ()

That's not the first occurrence: https://progress.opensuse.org/issues/111254#note-14

Suggestions

  • Disable the subtest in OBS
  • Try and reproduce this locally
  • ~Confirm that this is arm-specific (we suspect it's not)~
  • Double-check behavior of spawning and killing a process e.g. from this unit test, maybe it doesn't "sleep" properly
Actions #1

Updated by tinita over 1 year ago

  • Description updated (diff)
Actions #2

Updated by livdywan over 1 year ago

  • Subject changed from OBS ARM build fails in t/29-backend-generalhw.t to OBS build fails in t/29-backend-generalhw.t size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by mkittler over 1 year ago

  • Assignee set to mkittler
Actions #4

Updated by mkittler over 1 year ago

  • Status changed from Workable to In Progress
Actions #5

Updated by okurz over 1 year ago

merged.

Actions #6

Updated by openqa_review over 1 year ago

  • Due date set to 2022-10-20

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by mkittler over 1 year ago

  • Status changed from In Progress to Feedback

I'm not sure how to fix this test as I fail to see the problem. Maybe someone else from the team can have a look at the test (and the code being tested)? It is actually not really complicated.

Note that I tested locally what happens when the sleep is missing (so the command terminates immediately). Then one actually does not run into the error shown in the ticket description as the process remains around as a zombi until it has been waited for. That is easily observable by putting a sleep (or actually a long loop because sleep is mocked in that test) before the stop function. So I suppose it cannot be that sleep isn't invoked correctly.

Since generalhw.pm has use autodie ':all'; the lack of explicit error handling in start_serial_grab and stop_serial_grab should actually be ok. So if forking the sub process would not have worked we should have seen an error message and not have a plausible PID for the forked process being logged. In fact, that error handling is what causes the test to fail.

Actions #8

Updated by mkittler over 1 year ago

  • Status changed from Feedback to Resolved

It doesn't look like anybody else wants to have a look and I'm not sure what's the problem. So I'm resolving the issue by just keeping the test disabled in OBS. (I haven't seen any failures in GitHub actions.)

Actions

Also available in: Atom PDF