action #71845
closedtests fail in circleci with "Non-zero wait status: <non-negative number>" but "All <X> subtests passed"
Description
Observation¶
For example
https://app.circleci.com/pipelines/github/os-autoinst/openQA/4328/workflows/2380a4fe-7ea3-4f46-b0ad-99f9aacb9e7e/jobs/41532/steps
shows:
[08:43:42] t/api/09-comments.t ................ All 7 subtests passed
[08:43:58] t/api/10-jobgroups.t ............... ok 17566 ms ( 0.17 usr 0.01 sys + 15.13 cusr 0.81 csys = 16.12 CPU)
[08:44:16] t/api/11-bugs.t .................... ok 14348 ms ( 0.03 usr 0.00 sys + 12.18 cusr 1.48 csys = 13.69 CPU)
[08:44:30] t/api/12-admin-workers.t ........... ok 14216 ms ( 0.00 usr 0.00 sys + 11.71 cusr 1.83 csys = 13.54 CPU)
[08:44:44] t/api/13-influxdb.t ................ ok 13538 ms ( 0.01 usr 0.00 sys + 11.65 cusr 1.26 csys = 12.92 CPU)
[08:44:58] t/api/14-plugin_obs_rsync_async.t .. ok 46127 ms ( 0.07 usr 0.00 sys + 16.40 cusr 2.93 csys = 19.40 CPU)
[08:45:44] t/api/14-plugin_obs_rsync.t ........ ok 97631 ms ( 0.10 usr 0.01 sys + 81.41 cusr 10.57 csys = 92.09 CPU)
[08:47:22] t/api/15-search.t .................. ok 13592 ms ( 0.02 usr 0.00 sys + 11.22 cusr 1.63 csys = 12.87 CPU)
[08:47:35] t/api/15-users.t ................... ok 12358 ms ( 0.01 usr 0.00 sys + 10.75 cusr 1.02 csys = 11.78 CPU)
[08:47:48]
Test Summary Report
-------------------
t/api/09-comments.t (Wstat: 14 Tests: 7 Failed: 0)
Non-zero wait status: 14
t/api/09-comments.t reported "All 7 subtests passed" and also the "tests" tab in circleCI does show that all tests passed and the artifact file, e.g. https://circle-production-customer-artifacts.s3.amazonaws.com/picard/forks/58f7029dc9e77c000129905e/46416941/5f6c5695c60fc519f3c9f54a-0-build/artifacts/artifacts/api/09-comments.t?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200924T124432Z&X-Amz-SignedHeaders=host&X-Amz-Expires=60&X-Amz-Credential=AKIAJR3Q6CR467H7Z55A%2F20200924%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=3605db1a4895c0bab1d91b1656cb39c5bfede6a15b027eed28259d127a4f7542 shows in the end:
ok 7 - no (unexpected) warnings (via done_testing)
1..7
so all tests passed and "done_testing()" has been passed.
- https://app.circleci.com/pipelines/github/os-autoinst/openQA/4328/workflows/2380a4fe-7ea3-4f46-b0ad-99f9aacb9e7e/jobs/41532/steps shows that t/api/09-comments.t failed
- https://app.circleci.com/pipelines/github/os-autoinst/openQA/4329/workflows/01c4a5ce-86aa-4ba7-b77e-4a5a3d216c3d/jobs/41542 shows t/api/01-workers.t failed as well as t/api/02-iso.t .
Problem¶
tests abort even though everything else looks fine. We do not know what is wrong here. https://perldoc.perl.org/Test/More.html#EXIT-CODES
describes that Test::Builder itself would return 0 if all tests passed, 255 if the test aborted prematurely or a non-zero number corresponding to the number of failed tests if any. So this is most likely not the problem here.
Reading the source code of TAP::Formatter suggests that "Non-zero wait status: " has something to do with the exit code of other processes that we might wait for (implicitly or explicitly).
- H1: could be that it only happens in circleci because we run with "Devel::Cover", compare to https://stackoverflow.com/questions/36639621/what-does-non-zero-wait-status-9-mean-when-running-test-using-develcover
- H2: Something to do with processes in the background, maybe because they close stdout/stderr? See https://github.com/UPC/ravada/issues/194
- H3: It's because of SIGALRM due to OpenQA::Test::TimeLimit and we somehow do not get the message that the test timed out , see
man 7 signal
. Maybe because subprocesses do not handle the alarm in the same way as the main process should do. But the tests already finished so this might still be about stuck background processes