Project

General

Profile

Actions

action #95024

closed

openQA test t/ui/26-jobs_restart.t very unstable (already marked as unstable) size:M

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-07-02
Due date:
2021-07-22
% Done:

0%

Estimated time:

Description

Observation

https://app.circleci.com/pipelines/github/os-autoinst/openQA/6870/workflows/d884128b-fac7-4852-91b2-baf739f648a4/jobs/64708?invite=true#step-108-99 shows

RETRY=5 timeout -s SIGINT -k 5 -v $((5 * (5 + 1) ))m tools/retry prove -l --harness TAP::Harness::JUnit --timer t/ui/26-jobs_restart.t
Retry 1 of 5 …
[19:34:35] t/ui/26-jobs_restart.t ..       All 10 subtests passed 
[19:35:36]

Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
  Non-zero wait status: 14
Files=1, Tests=10, 60.7488 wallclock secs ( 0.36 usr  0.05 sys + 49.70 cusr  1.54 csys = 51.65 CPU)
Result: FAIL
Retry 2 of 5 …
[19:35:37] t/ui/26-jobs_restart.t ..       All 10 subtests passed 
[19:36:37]

Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
  Non-zero wait status: 14
Files=1, Tests=10, 60.8001 wallclock secs ( 0.36 usr  0.03 sys + 49.43 cusr  1.59 csys = 51.41 CPU)
Result: FAIL
Retry 3 of 5 …
[19:36:38] t/ui/26-jobs_restart.t ..       All 10 subtests passed 
[19:37:39]

Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
  Non-zero wait status: 14
Files=1, Tests=10, 60.7867 wallclock secs ( 0.36 usr  0.02 sys + 49.28 cusr  1.55 csys = 51.21 CPU)
Result: FAIL
Retry 4 of 5 …
[19:37:40] t/ui/26-jobs_restart.t ..       All 10 subtests passed 
[19:38:41]

Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
  Non-zero wait status: 14
Files=1, Tests=10, 60.8044 wallclock secs ( 0.35 usr  0.03 sys + 49.73 cusr  1.48 csys = 51.59 CPU)
Result: FAIL
Retry 5 of 5 …
[19:38:42] t/ui/26-jobs_restart.t ..       All 10 subtests passed 
[19:39:42]

Test Summary Report
-------------------
t/ui/26-jobs_restart.t (Wstat: 14 Tests: 10 Failed: 0)
  Non-zero wait status: 14
Files=1, Tests=10, 60.8018 wallclock secs ( 0.38 usr  0.02 sys + 49.00 cusr  1.59 csys = 50.99 CPU)
Result: FAIL
make[2]: *** [Makefile:188: test-unit-and-integration] Error 1
make[2]: Leaving directory '/home/squamata/project'
make[1]: *** [Makefile:183: test-with-database] Error 2
make[1]: Leaving directory '/home/squamata/project'
make: *** [Makefile:168: test-unstable] Error 2

Exited with code exit status 2

Expected result

  • At least back to 1/3 failures, not more, better less than 0/100 failures

Suggestions

The tests report "All 10 subtests passed" but then fail with "Non-zero wait status: 14". We had similar cases in the past. This has likely something to do with the cleanup of background processes. Maybe we introduced a regression lately so that this background handling behaves different

  • I suggest to try to reproduce locally, bisect between "last good" and "first bad" to find the culprit
  • docs/Contributing.asciidoc explains that "Non-zero wait status: 14" just means that the test consistently times out now so something makes it super-slow or the test effectively never ends. This can be easily checked locally by just executing the test some times

Related issues 1 (0 open1 closed)

Related to openQA Project - action #95009: unstable/flaky test t/ui/12-needle-edit.t size:MResolvedtinita2021-07-01

Actions
Actions

Also available in: Atom PDF