action #75370
closedunstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute"
Description
Observation¶
# Failed test 'test 1 is running'
# at t/full-stack.t line 128.
# Failed test 'worker did not propagate URL for os-autoinst cmd srv within 1 minute'
# at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 195.
# Failed test 'developer console for test 1'
# at t/full-stack.t line 134.
# Looks like you failed 2 tests of 3.
[02:46:51] t/full-stack.t .. 377/?
# Failed test 'wait until developer console becomes available'
# at t/full-stack.t line 135.
Steps to reproduce¶
- The failure is observed on CircleCI
- To be confirmed if this can be reproduced locally with
make test STABILITY_TEST=1 RETRY=500 FULLSTACK=1 TESTS=t/full-stack.t
Suggestions¶
- Add retries back
- Reproduce locally or within circleCI
- Fix tests or production code
- Ensure stability with enough runs, e.g. 500
- Investigate regressings in latest dependencies
- aspell-0.60.6.1 -> aspell-0.60.8
- aspell-spell-0.60.6.1 -> aspell-spell-0.60.8
- libaspell15-0.60.6.1 -> libaspell15-0.60.8
- perl-IO-Socket-SSL-2.052 -> perl-IO-Socket-SSL-2.066
- perl-Net-SSLeay-1.81 -> perl-Net-SSLeay-1.88
- perl-PPIx-Regexp-0.058 -> perl-PPIx-Regexp-0.071
- perl-Selenium-Remote-Driver-1.37 -> perl-Selenium-Remote-Driver-1.38
- python3-pathspec-0.5.9 -> python3-pathspec-0.7.0
- python3-yamllint-1.15.0 -> python3-yamllint-1.22.1
- ShellCheck-0.6.0 -> ShellCheck-0.7.1
See also #75346 for a new failure on master in OBS.
Workaround¶
Retrigger as this seems to be "sporadic".
Updated by okurz almost 4 years ago
- Related to action #75346: t/api/08-jobtemplates.t started failing in OBS checks added
Updated by okurz almost 4 years ago
- Status changed from New to Workable
- Priority changed from Normal to High
- Target version set to Ready
Updated by okurz almost 4 years ago
- Has duplicate action #76900: unstable/flaky/sporadic t/full-stack.t test failing in CircleCI "worker did not propagate URL for os-autoinst cmd srv within 1 minute" added
Updated by okurz almost 4 years ago
- Subject changed from t/full-stack.t failing on master to unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute"
- Description updated (diff)
- Priority changed from High to Normal
prepared https://github.com/os-autoinst/openQA/pull/3503 (merged) which adds back retries on the Makefile level again for now. This reduces prio for us a bit.
Integrated duplicate report #76900
Updated by livdywan almost 4 years ago
- Status changed from Workable to Feedback
This PR aims to address the issue with
- a more predictable timeout (less specialized code)
- a longer timeout
- ajax waits to avoid refreshing the page too fast
https://github.com/os-autoinst/openQA/pull/3504
Naturally this will need to be monitored in future builds.
Updated by okurz almost 4 years ago
- Due date set to 2020-11-11
- Status changed from Resolved to Feedback
Hi cdywan, originally I assumed that the issue is linked to a change in dependencies or something that caused the test to fail much more often in the mentioned steps than in before. Both from your tickets and your PR I don't see something that would explain if you think this is just coincidence or if there was really something in dependencies that might have caused slightly different behaviour.
Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.
Updated by livdywan almost 4 years ago
- Due date changed from 2020-11-11 to 2020-11-20
okurz wrote:
Hi cdywan, originally I assumed that the issue is linked to a change in dependencies or something that caused the test to fail much more often in the mentioned steps than in before. Both from your tickets and your PR I don't see something that would explain if you think this is just coincidence or if there was really something in dependencies that might have caused slightly different behaviour.
I'm not positive it's to do with a dependenc change either, my PR was about making the test more robust in general i.e. maybe it "should have" been unreliable even before.
Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.
I think what actually we want is, n passes on CircleCI. But dropping the RETRY again, yes.
Updated by livdywan almost 4 years ago
cdywan wrote:
okurz wrote:
Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.
I think what actually we want is, n passes on CircleCI. But dropping the RETRY again, yes.
Updated by livdywan almost 4 years ago
- Status changed from Feedback to Resolved
As also mentioned on the PR, I checked that previous runs of the fullstack test on CircleCI succeeded on the first try (not counting a case of a PR failing to pull the container image). PR is merged, so I think we can call this resolved since the actual fix was already "in feedback" before.