Project

General

Profile

Actions

action #75370

closed

unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute"

Added by livdywan over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2020-10-27
Due date:
2020-11-20
% Done:

0%

Estimated time:

Description

Observation

https://app.circleci.com/pipelines/github/os-autoinst/openQA/4619/workflows/befb448a-59ed-46b7-b98d-dd4f3d2f035f/jobs/44126/steps

#   Failed test 'test 1 is running'
#   at t/full-stack.t line 128.

    #   Failed test 'worker did not propagate URL for os-autoinst cmd srv within 1 minute'
    #   at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 195.

    #   Failed test 'developer console for test 1'
    #   at t/full-stack.t line 134.
    # Looks like you failed 2 tests of 3.
[02:46:51] t/full-stack.t .. 377/? 
#   Failed test 'wait until developer console becomes available'
#   at t/full-stack.t line 135.

Steps to reproduce

  • The failure is observed on CircleCI
  • To be confirmed if this can be reproduced locally with
make test STABILITY_TEST=1 RETRY=500 FULLSTACK=1 TESTS=t/full-stack.t

Suggestions

  • Add retries back
  • Reproduce locally or within circleCI
  • Fix tests or production code
  • Ensure stability with enough runs, e.g. 500
  • Investigate regressings in latest dependencies
    • aspell-0.60.6.1 -> aspell-0.60.8
    • aspell-spell-0.60.6.1 -> aspell-spell-0.60.8
    • libaspell15-0.60.6.1 -> libaspell15-0.60.8
    • perl-IO-Socket-SSL-2.052 -> perl-IO-Socket-SSL-2.066
    • perl-Net-SSLeay-1.81 -> perl-Net-SSLeay-1.88
    • perl-PPIx-Regexp-0.058 -> perl-PPIx-Regexp-0.071
    • perl-Selenium-Remote-Driver-1.37 -> perl-Selenium-Remote-Driver-1.38
    • python3-pathspec-0.5.9 -> python3-pathspec-0.7.0
    • python3-yamllint-1.15.0 -> python3-yamllint-1.22.1
    • ShellCheck-0.6.0 -> ShellCheck-0.7.1

See also #75346 for a new failure on master in OBS.

Workaround

Retrigger as this seems to be "sporadic".


Related issues 2 (0 open2 closed)

Related to openQA Project - action #75346: t/api/08-jobtemplates.t started failing in OBS checksResolvedtinita2020-10-26

Actions
Has duplicate openQA Project - action #76900: unstable/flaky/sporadic t/full-stack.t test failing in CircleCI "worker did not propagate URL for os-autoinst cmd srv within 1 minute"Resolvedokurz

Actions
Actions #1

Updated by okurz over 3 years ago

  • Related to action #75346: t/api/08-jobtemplates.t started failing in OBS checks added
Actions #2

Updated by okurz over 3 years ago

  • Status changed from New to Workable
  • Priority changed from Normal to High
  • Target version set to Ready
Actions #3

Updated by okurz over 3 years ago

  • Has duplicate action #76900: unstable/flaky/sporadic t/full-stack.t test failing in CircleCI "worker did not propagate URL for os-autoinst cmd srv within 1 minute" added
Actions #4

Updated by okurz over 3 years ago

  • Subject changed from t/full-stack.t failing on master to unstable/flaky/sporadic t/full-stack.t failing on master (circleCI) "worker did not propagate URL for os-autoinst cmd srv within 1 minute"
  • Description updated (diff)
  • Priority changed from High to Normal

prepared https://github.com/os-autoinst/openQA/pull/3503 (merged) which adds back retries on the Makefile level again for now. This reduces prio for us a bit.

Integrated duplicate report #76900

Actions #5

Updated by livdywan over 3 years ago

  • Status changed from Workable to Feedback

This PR aims to address the issue with

  • a more predictable timeout (less specialized code)
  • a longer timeout
  • ajax waits to avoid refreshing the page too fast

https://github.com/os-autoinst/openQA/pull/3504

Naturally this will need to be monitored in future builds.

Actions #6

Updated by livdywan over 3 years ago

  • Assignee set to livdywan
Actions #7

Updated by livdywan over 3 years ago

  • Status changed from Feedback to Resolved
Actions #8

Updated by okurz over 3 years ago

  • Due date set to 2020-11-11
  • Status changed from Resolved to Feedback

Hi cdywan, originally I assumed that the issue is linked to a change in dependencies or something that caused the test to fail much more often in the mentioned steps than in before. Both from your tickets and your PR I don't see something that would explain if you think this is just coincidence or if there was really something in dependencies that might have caused slightly different behaviour.

Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.

Actions #9

Updated by livdywan over 3 years ago

  • Due date changed from 2020-11-11 to 2020-11-20

okurz wrote:

Hi cdywan, originally I assumed that the issue is linked to a change in dependencies or something that caused the test to fail much more often in the mentioned steps than in before. Both from your tickets and your PR I don't see something that would explain if you think this is just coincidence or if there was really something in dependencies that might have caused slightly different behaviour.

I'm not positive it's to do with a dependenc change either, my PR was about making the test more robust in general i.e. maybe it "should have" been unreliable even before.

Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.

I think what actually we want is, n passes on CircleCI. But dropping the RETRY again, yes.

Actions #10

Updated by livdywan over 3 years ago

cdywan wrote:

okurz wrote:

Also, we have added back RETRY=3 to the Makefile for t/full-stack.t which we should remove before calling this done. I suggest to follow the suggestions in https://progress.opensuse.org/issues/75370#Suggestions with e.g. 500 runs to verify stability.

I think what actually we want is, n passes on CircleCI. But dropping the RETRY again, yes.

https://github.com/os-autoinst/openQA/pull/3562

Actions #11

Updated by livdywan over 3 years ago

  • Status changed from Feedback to Resolved

As also mentioned on the PR, I checked that previous runs of the fullstack test on CircleCI succeeded on the first try (not counting a case of a PR failing to pull the container image). PR is merged, so I think we can call this resolved since the actual fix was already "in feedback" before.

Actions

Also available in: Atom PDF