action #75370: unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute" - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #75370

closed

unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute"

Added by livdywan over 4 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

livdywan

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2020-10-27

Due date:

2020-11-20

% Done:

Estimated time:

Description

Observation¶

https://app.circleci.com/pipelines/github/os-autoinst/openQA/4619/workflows/befb448a-59ed-46b7-b98d-dd4f3d2f035f/jobs/44126/steps

#   Failed test 'test 1 is running'
#   at t/full-stack.t line 128.

    #   Failed test 'worker did not propagate URL for os-autoinst cmd srv within 1 minute'
    #   at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 195.

    #   Failed test 'developer console for test 1'
    #   at t/full-stack.t line 134.
    # Looks like you failed 2 tests of 3.
[02:46:51] t/full-stack.t .. 377/? 
#   Failed test 'wait until developer console becomes available'
#   at t/full-stack.t line 135.

Steps to reproduce¶

The failure is observed on CircleCI
To be confirmed if this can be reproduced locally with

make test STABILITY_TEST=1 RETRY=500 FULLSTACK=1 TESTS=t/full-stack.t

Suggestions¶

Add retries back
Reproduce locally or within circleCI
Fix tests or production code
Ensure stability with enough runs, e.g. 500
Investigate regressings in latest dependencies
- aspell-0.60.6.1 -> aspell-0.60.8
- aspell-spell-0.60.6.1 -> aspell-spell-0.60.8
- libaspell15-0.60.6.1 -> libaspell15-0.60.8
- perl-IO-Socket-SSL-2.052 -> perl-IO-Socket-SSL-2.066
- perl-Net-SSLeay-1.81 -> perl-Net-SSLeay-1.88
- perl-PPIx-Regexp-0.058 -> perl-PPIx-Regexp-0.071
- perl-Selenium-Remote-Driver-1.37 -> perl-Selenium-Remote-Driver-1.38
- python3-pathspec-0.5.9 -> python3-pathspec-0.7.0
- python3-yamllint-1.15.0 -> python3-yamllint-1.22.1
- ShellCheck-0.6.0 -> ShellCheck-0.7.1

See also #75346 for a new failure on master in OBS.

Workaround¶

Retrigger as this seems to be "sporadic".

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by okurz over 4 years ago

Related to action #75346: t/api/08-jobtemplates.t started failing in OBS checks added

Actions

Copy link

Updated by okurz over 4 years ago

Status changed from New to Workable
Priority changed from Normal to High
Target version set to Ready

Actions

Copy link

Updated by okurz over 4 years ago

Has duplicate action #76900: unstable/flaky/sporadic t/full-stack.t test failing in CircleCI "worker did not propagate URL for os-autoinst cmd srv within 1 minute" added

Actions

Copy link

Updated by okurz over 4 years ago

Subject changed from t/full-stack.t failing on master to unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute"
Description updated (diff)
Priority changed from High to Normal

prepared https://github.com/os-autoinst/openQA/pull/3503 (merged) which adds back retries on the Makefile level again for now. This reduces prio for us a bit.

Integrated duplicate report #76900

Actions

Copy link

Updated by livdywan over 4 years ago

Status changed from Workable to Feedback

This PR aims to address the issue with

a more predictable timeout (less specialized code)
a longer timeout
ajax waits to avoid refreshing the page too fast

https://github.com/os-autoinst/openQA/pull/3504

Naturally this will need to be monitored in future builds.

Actions

Copy link

Updated by livdywan over 4 years ago

Assignee set to livdywan

Actions

Copy link

Updated by livdywan over 4 years ago

Status changed from Feedback to Resolved

Actions

Copy link

Updated by okurz over 4 years ago

Due date set to 2020-11-11
Status changed from Resolved to Feedback

Hi cdywan, originally I assumed that the issue is linked to a change in dependencies or something that caused the test to fail much more often in the mentioned steps than in before. Both from your tickets and your PR I don't see something that would explain if you think this is just coincidence or if there was really something in dependencies that might have caused slightly different behaviour.

Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.

Actions

Copy link

Updated by livdywan over 4 years ago

Due date changed from 2020-11-11 to 2020-11-20

okurz wrote:

Hi cdywan, originally I assumed that the issue is linked to a change in dependencies or something that caused the test to fail much more often in the mentioned steps than in before. Both from your tickets and your PR I don't see something that would explain if you think this is just coincidence or if there was really something in dependencies that might have caused slightly different behaviour.

I'm not positive it's to do with a dependenc change either, my PR was about making the test more robust in general i.e. maybe it "should have" been unreliable even before.

Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.

I think what actually we want is, n passes on CircleCI. But dropping the RETRY again, yes.

Actions

Copy link

#10

Updated by livdywan over 4 years ago

cdywan wrote:

okurz wrote:

Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.

I think what actually we want is, n passes on CircleCI. But dropping the RETRY again, yes.

https://github.com/os-autoinst/openQA/pull/3562

Actions

Copy link

#11

Updated by livdywan over 4 years ago

Status changed from Feedback to Resolved

As also mentioned on the PR, I checked that previous runs of the fullstack test on CircleCI succeeded on the first try (not counting a case of a PR failing to pull the container image). PR is merged, so I think we can call this resolved since the actual fix was already "in feedback" before.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #75370

unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute"

Observation¶

Steps to reproduce¶

Suggestions¶

Workaround¶

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago

Updated by livdywan over 4 years ago

Updated by livdywan over 4 years ago

Updated by livdywan over 4 years ago

Updated by okurz over 4 years ago

Updated by livdywan over 4 years ago

Updated by livdywan over 4 years ago

Updated by livdywan over 4 years ago