action #80800
Updated by mkittler almost 4 years ago
## Observation In https://app.circleci.com/pipelines/github/os-autoinst/openQA/5211/workflows/9d31f5c4-fd61-410e-890d-0006dfef245f/jobs/49513/parallel-runs/0/steps/0-111?invite=true (raw document: https://circleci.com/api/v1.1/project/github/os-autoinst/openQA/49513/output/111/0?file=true&allocation-id=5fce507fd2bd8103886c1465-0-build%2F71C05C00) I saw t/full-stack.t failing with ``` RETRY=0 timeout -s SIGINT -k 5 -v $((9 * (0 + 1) ))m tools/retry prove -l --harness TAP::Harness::JUnit --timer t/full-stack.t [16:26:38] t/full-stack.t .. 375/? # full result panel contents: # State: scheduled # Scheduled product: job has not been created by posting an ISO # 50 # Failed test 'test 1 is running' # at t/full-stack.t line 128. Bailout called. Further testing stopped: URL for os-autoinst cmd srv not available ``` full log from background services available in https://app.circleci.com/pipelines/github/os-autoinst/openQA/5211/workflows/9d31f5c4-fd61-410e-890d-0006dfef245f/jobs/49513/artifacts I think I have seen this in before. ## Steps to reproduce TBC ## Expected result t/full-stack.t should be stable, e.g. in 1000/1000 runs. ## Suggestions * Check https://49513-20883829-gh.circle-artifacts.com/0/artifacts/full-stack.t * Try to reproduce the issue in your local environment, e.g. with `time env FULLSTACK=1 runs=400 count_fail_ratio prove -l t/full-stack.t` using https://github.com/okurz/scripts/blob/master/count_fail_ratio http://github.com/okurz/scripts/bin/count_fail_ratio * Improve error output of full stack test, e.g. the message "waiting for worker to propagate URL for os-autoinst cmd srv" could be a bit more verbose and tell what it found and what it expects