action #95024
Updated by okurz almost 3 years ago
## Observation
https://app.circleci.com/pipelines/github/os-autoinst/openQA/6870/workflows/d884128b-fac7-4852-91b2-baf739f648a4/jobs/64708?invite=true#step-108-99 shows
```
RETRY=5 timeout -s SIGINT -k 5 -v $((5 * (5 + 1) ))m tools/retry prove -l --harness TAP::Harness::JUnit --timer t/ui/26-jobs_restart.t
Retry 1 of 5 …
[19:34:35] t/ui/26-jobs_restart.t .. All 10 subtests passed
[19:35:36]
Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
Non-zero wait status: 14
Files=1, Tests=10, 60.7488 wallclock secs ( 0.36 usr 0.05 sys + 49.70 cusr 1.54 csys = 51.65 CPU)
Result: FAIL
Retry 2 of 5 …
[19:35:37] t/ui/26-jobs_restart.t .. All 10 subtests passed
[19:36:37]
Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
Non-zero wait status: 14
Files=1, Tests=10, 60.8001 wallclock secs ( 0.36 usr 0.03 sys + 49.43 cusr 1.59 csys = 51.41 CPU)
Result: FAIL
Retry 3 of 5 …
[19:36:38] t/ui/26-jobs_restart.t .. All 10 subtests passed
[19:37:39]
Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
Non-zero wait status: 14
Files=1, Tests=10, 60.7867 wallclock secs ( 0.36 usr 0.02 sys + 49.28 cusr 1.55 csys = 51.21 CPU)
Result: FAIL
Retry 4 of 5 …
[19:37:40] t/ui/26-jobs_restart.t .. All 10 subtests passed
[19:38:41]
Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
Non-zero wait status: 14
Files=1, Tests=10, 60.8044 wallclock secs ( 0.35 usr 0.03 sys + 49.73 cusr 1.48 csys = 51.59 CPU)
Result: FAIL
Retry 5 of 5 …
[19:38:42] t/ui/26-jobs_restart.t .. All 10 subtests passed
[19:39:42]
Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
Non-zero wait status: 14
Files=1, Tests=10, 60.8018 wallclock secs ( 0.38 usr 0.02 sys + 49.00 cusr 1.59 csys = 50.99 CPU)
Result: FAIL
make[2]: *** [Makefile:188: test-unit-and-integration] Error 1
make[2]: Leaving directory '/home/squamata/project'
make[1]: *** [Makefile:183: test-with-database] Error 2
make[1]: Leaving directory '/home/squamata/project'
make: *** [Makefile:168: test-unstable] Error 2
Exited with code exit status 2
```
## Expected result
* At least back to 1/3 failures, not more, better less than 0/100 failures
## Suggestions
The tests report "All 10 subtests passed" but then fail with "Non-zero wait status: 14". We had similar cases in the past. This has likely something to do with the cleanup of background processes. Maybe we introduced a regression lately so that this background handling behaves different
* I suggest to try to reproduce locally, bisect between "last good" and "first bad" to find the culprit
* docs/Contributing.asciidoc explains that "Non-zero wait status: 14" just means that the test consistently times out now so something makes it super-slow *or* the test effectively never ends. This can be easily checked locally by just executing the test some times