Project

General

Profile

action #95024

Updated by okurz almost 3 years ago

## Observation 

 https://app.circleci.com/pipelines/github/os-autoinst/openQA/6870/workflows/d884128b-fac7-4852-91b2-baf739f648a4/jobs/64708?invite=true#step-108-99 shows 

 ``` 
 RETRY=5 timeout -s SIGINT -k 5 -v $((5 * (5 + 1) ))m tools/retry prove -l --harness TAP::Harness::JUnit --timer t/ui/26-jobs_restart.t 
 Retry 1 of 5 … 
 [19:34:35] t/ui/26-jobs_restart.t ..         All 10 subtests passed  
 [19:35:36] 

 Test Summary Report 
 ------------------- 
 t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0) 
   Non-zero wait status: 14 
 Files=1, Tests=10, 60.7488 wallclock secs ( 0.36 usr    0.05 sys + 49.70 cusr    1.54 csys = 51.65 CPU) 
 Result: FAIL 
 Retry 2 of 5 … 
 [19:35:37] t/ui/26-jobs_restart.t ..         All 10 subtests passed  
 [19:36:37] 

 Test Summary Report 
 ------------------- 
 t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0) 
   Non-zero wait status: 14 
 Files=1, Tests=10, 60.8001 wallclock secs ( 0.36 usr    0.03 sys + 49.43 cusr    1.59 csys = 51.41 CPU) 
 Result: FAIL 
 Retry 3 of 5 … 
 [19:36:38] t/ui/26-jobs_restart.t ..         All 10 subtests passed  
 [19:37:39] 

 Test Summary Report 
 ------------------- 
 t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0) 
   Non-zero wait status: 14 
 Files=1, Tests=10, 60.7867 wallclock secs ( 0.36 usr    0.02 sys + 49.28 cusr    1.55 csys = 51.21 CPU) 
 Result: FAIL 
 Retry 4 of 5 … 
 [19:37:40] t/ui/26-jobs_restart.t ..         All 10 subtests passed  
 [19:38:41] 

 Test Summary Report 
 ------------------- 
 t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0) 
   Non-zero wait status: 14 
 Files=1, Tests=10, 60.8044 wallclock secs ( 0.35 usr    0.03 sys + 49.73 cusr    1.48 csys = 51.59 CPU) 
 Result: FAIL 
 Retry 5 of 5 … 
 [19:38:42] t/ui/26-jobs_restart.t ..         All 10 subtests passed  
 [19:39:42] 

 Test Summary Report 
 ------------------- 
 t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0) 
   Non-zero wait status: 14 
 Files=1, Tests=10, 60.8018 wallclock secs ( 0.38 usr    0.02 sys + 49.00 cusr    1.59 csys = 50.99 CPU) 
 Result: FAIL 
 make[2]: *** [Makefile:188: test-unit-and-integration] Error 1 
 make[2]: Leaving directory '/home/squamata/project' 
 make[1]: *** [Makefile:183: test-with-database] Error 2 
 make[1]: Leaving directory '/home/squamata/project' 
 make: *** [Makefile:168: test-unstable] Error 2 

 Exited with code exit status 2 
 ``` 

 ## Expected result 
 * At least back to 1/3 failures, not more, better less than 0/100 failures 

 ## Suggestions 

 The tests report "All 10 subtests passed" but then fail with "Non-zero wait status: 14". We had similar cases in the past. This has likely something to do with the cleanup of background processes. Maybe we introduced a regression lately so that this background handling behaves different 

 * I suggest to try to reproduce locally, bisect between "last good" and "first bad" to find the culprit 
 * docs/Contributing.asciidoc explains that "Non-zero wait status: 14" just means that the test consistently times out now so something makes it super-slow *or* the test effectively never ends. This can be easily checked locally by just executing the test some times

Back