action #95009
closedunstable/flaky test t/ui/12-needle-edit.t size:M
Description
Observation¶
At least in two places I have seen t/ui/12-needle-edit.t failing, e.g. https://app.circleci.com/pipelines/github/os-autoinst/openQA/6868/workflows/e88ba45b-b130-477f-8095-441095d446a0/jobs/64683/steps
[18:12:47] t/ui/12-needle-edit.t ...................... 4/?
# Failed test 'sub process openqa-webapi terminated with exit code 10752'
# at /home/squamata/project/t/ui/../lib/OpenQA/Test/Utils.pm line 223.
A context appears to have been destroyed without first calling release().
Based on $@ it does not look like an exception was thrown (this is not always
a reliable test)
This is a problem because the global error variables ($!, $@, and $?) will
not be restored. In addition some release callbacks will not work properly from
inside a DESTROY method.
Here are the context creation details, just in case a tool forgot to call
release():
File: t/ui/12-needle-edit.t
Line: 415
Tool: Test::More::subtest
Cleaning up the CONTEXT stack...
# Failed test 'no (unexpected) warnings (via END block)'
# at /usr/lib/perl5/5.26.1/Test/Builder.pm line 135.
# Got the following unexpected warnings:
# 1: A context appears to have been destroyed without first calling release().
# Based on $@ it does not look like an exception was thrown (this is not always
# a reliable test)
#
# This is a problem because the global error variables ($!, $@, and $?) will
# not be restored. In addition some release callbacks will not work properly from
# inside a DESTROY method.
#
# Here are the context creation details, just in case a tool forgot to call
# release():
# File: t/ui/12-needle-edit.t
# Line: 415
# Tool: Test::More::subtest
#
# Cleaning up the CONTEXT stack...
# Test ended with extra hubs on the stack!
# Tests were run but no plan was declared and done_testing() was not seen.
[18:12:47] t/ui/12-needle-edit.t ...................... Dubious, test returned 255 (wstat 65280, 0xff00)
All 4 subtests passed
Acceptance criteria¶
- AC1: t/ui/12-needle-edit.t stable over multiple runs in CI
Suggestions¶
- Move test to unstable tests
- Find out what is the failure ratio, e.g. take a look from past circleCI jobs
- Reproduce instability in CI or locally or fix likely problem
- Research in git logs if there are recent related changes
- Ensure it's stable locally
- Remove from unstable tests again
Updated by okurz over 3 years ago
https://github.com/os-autoinst/openQA/pull/4001 for the first step to mark it unstable (not a fix per se)
Updated by okurz over 3 years ago
- Related to action #95024: openQA test t/ui/26-jobs_restart.t very unstable (already marked as unstable) size:M added
Updated by okurz over 3 years ago
- Subject changed from unstable test t/ui/12-needle-edit.t to unstable/flaky test t/ui/12-needle-edit.t size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by tinita over 3 years ago
The verbose output looks like that:
ok 4 - Saving needle when "taking matches" selected but no matches present
# Subtest: Saving needle when "taking matches" not selected
ok 1 - Wait for jQuery successful
[debug] [zpoeCoUn393c] POST "/tests/99938/modules/logpackages/steps/1"
[debug] [zpoeCoUn393c] Routing to controller "OpenQA::Shared::Controller::Session" and action "ensure_operator"
[debug] [zpoeCoUn393c] Routing to controller "OpenQA::WebAPI::Controller::Step" and action "save_needle_ajax"
[debug] [zpoeCoUn393c] Template "step/save_needle_ajax.html.ep" not found
[debug] [zpoeCoUn393c] Nothing has been rendered, expecting delayed response
[debug] Inactivity timeout # <-------------------------------------- not seen in a successful test. This is a message from Mojolicious
[debug] [Qf1y73voCtD_] POST "/tests/99938/modules/logpackages/steps/1"
[debug] [Qf1y73voCtD_] Routing to controller "OpenQA::Shared::Controller::Session" and action "ensure_operator"
[debug] [Qf1y73voCtD_] Routing to controller "OpenQA::WebAPI::Controller::Step" and action "save_needle_ajax"
[debug] [Qf1y73voCtD_] Template "step/save_needle_ajax.html.ep" not found
[debug] [Qf1y73voCtD_] Nothing has been rendered, expecting delayed response
# Here we can see several seconds where nothing happens <----------------------------------------------
[debug] Process 248 is performing job "8" with task "save_needle"
[error] [zpoeCoUn393c] Transaction already destroyed at /usr/lib/perl5/vendor_perl/5.26.1/Mojolicious/Renderer.pm line 29.
[error] [pid:134] Stopping openqa-webapi process because a Perl warning occurred: Unhandled rejected promise: Transaction already destroyed at /usr/lib/perl5/vendor_perl/5.26.1/Mojolicious/Renderer.pm line 29.
at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Reactor/Poll.pm line 129.
not ok 2 - sub process openqa-webapi terminated with exit code 10752
# Failed test 'sub process openqa-webapi terminated with exit code 10752'
# at /home/squamata/project/t/ui/../lib/OpenQA/Test/Utils.pm line 223.
1..2
A context appears to have been destroyed without first calling release().
Based on $@ it does not look like an exception was thrown (this is not always
a reliable test)
...
Here the output from a successful test:
# Subtest: Saving needle when "taking matches" not selected
ok 1 - Wait for jQuery successful
[debug] [3sSUpOuWaEeJ] POST "/tests/99938/modules/logpackages/steps/1"
[debug] [3sSUpOuWaEeJ] Routing to controller "OpenQA::Shared::Controller::Session" and action "ensure_operator"
[debug] [3sSUpOuWaEeJ] Routing to controller "OpenQA::WebAPI::Controller::Step" and action "save_needle_ajax"
[debug] [3sSUpOuWaEeJ] Template "step/save_needle_ajax.html.ep" not found
[debug] [3sSUpOuWaEeJ] Nothing has been rendered, expecting delayed response
[debug] Process 24107 is performing job "8" with task "save_needle"
[debug] [3sSUpOuWaEeJ] 200 OK (0.518946s, 1.927/s)
ok 2 - Wait for jQuery successful
[debug] [VCYOTJ7f9TEU] POST "/tests/99938/modules/logpackages/steps/1"
[debug] [VCYOTJ7f9TEU] Routing to controller "OpenQA::Shared::Controller::Session" and action "ensure_operator"
[debug] [VCYOTJ7f9TEU] Routing to controller "OpenQA::WebAPI::Controller::Step" and action "save_needle_ajax"
[debug] [VCYOTJ7f9TEU] Template "step/save_needle_ajax.html.ep" not found
[debug] [VCYOTJ7f9TEU] Nothing has been rendered, expecting delayed response
[debug] Process 24118 is performing job "9" with task "save_needle"
[debug] [VCYOTJ7f9TEU] 200 OK (0.5342s, 1.872/s)
[debug] Process 24121 is performing job "10" with task "scan_needles"
ok 3 - Wait for jQuery successful
ok 4 - highlight appears correct
1..4
ok 5 - Saving needle when "taking matches" not selected
Updated by tinita over 3 years ago
I find it very interesting that the test label seems to be replaced with an error message:
not ok 2 - sub process openqa-webapi terminated with exit code 10752
Updated by tinita over 3 years ago
I made a branch "circleci-invest" from https://github.com/os-autoinst/openQA/commit/b8a7c9d3d38c57589d33a4ad5db02db0a04f9e8a ("Merge pull request #3991 from Martchus/carry-over-2") from yesterday. Until then we didn't see the failures.
The test shows the same failures we see in current master:
https://app.circleci.com/pipelines/github/perlpunk/openQA/204/workflows/b6dfabfd-557f-437c-bd51-1ef0e3c64ee1
- Checkstyle has a timeout https://app.circleci.com/pipelines/github/perlpunk/openQA/204/workflows/b6dfabfd-557f-437c-bd51-1ef0e3c64ee1/jobs/990
- API has a timeout https://app.circleci.com/pipelines/github/perlpunk/openQA/204/workflows/b6dfabfd-557f-437c-bd51-1ef0e3c64ee1/jobs/994
- Unstable has unknown failures (CircleCI shows red failure, but no failed tests are seen in the logs or artifacts
- t/ui/27-plugin_obs_rsync_obs_status.t has a timeout
- t/ui/12-needle-edit.t has the mentioned failure
Updated by tinita over 3 years ago
Similar results from https://github.com/os-autoinst/openQA/commit/eda517ece98346c10afe43e4bfdef1cbba7fca73 which is before the last two dependency crons including perl-Mojo-SQLite and perltidy
Can we take that as a proof that no changes from openQA side are responsible?
One thing that could be responsible is package updates that are not tracked in our ci-packages.txt
, that means the ones that are already installed in the container. (For example we had problems with a chromium update recently).
The other possibility is that CircleCI performance is pretty bad currently, so being very slow triggers all our explicitly set time limits, but also other timeouts like the Mojolicious one I mentioned above.
Updated by okurz over 3 years ago
tinita wrote:
Similar results from https://github.com/os-autoinst/openQA/commit/eda517ece98346c10afe43e4bfdef1cbba7fca73 which is before the last two dependency crons including perl-Mojo-SQLite and perltidy
Can we take that as a proof that no changes from openQA side are responsible?
openQA source code, yes.
One thing that could be responsible is package updates that are not tracked in our
ci-packages.txt
, that means the ones that are already installed in the container. (For example we had problems with a chromium update recently).
correct
The other possibility is that CircleCI performance is pretty bad currently, so being very slow triggers all our explicitly set time limits, but also other timeouts like the Mojolicious one I mentioned above.
Then research if other users report similar problems. Get in contact with the circleCI community and support.
Updated by tinita over 3 years ago
I didn't see any complaints from others in the CircleCI support forum.
It just seemed to be slower in general, and I assume most people don't have such timeouts in their tests as we have.
PR: https://github.com/os-autoinst/openQA/pull/4009
Although it looks like the performance is not as bad as friday anymore.
Updated by tinita over 3 years ago
- Status changed from Feedback to Resolved
This specific error didn't occur anymore, so the mojo timeout helped.
Also the other timeout bumps helped, but it looks like there might be a few more tests where we should increase the TimeLimit