Project

General

Profile

Actions

action #80800

closed

flaky/unstable t/full-stack.t, Failed test 'test 1 is running', Bailout called. Further testing stopped: URL for os-autoinst cmd srv not available

Added by okurz over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2020-12-07
Due date:
2021-02-24
% Done:

0%

Estimated time:

Description

Observation

In https://app.circleci.com/pipelines/github/os-autoinst/openQA/5211/workflows/9d31f5c4-fd61-410e-890d-0006dfef245f/jobs/49513/parallel-runs/0/steps/0-111?invite=true (raw document: https://circleci.com/api/v1.1/project/github/os-autoinst/openQA/49513/output/111/0?file=true&allocation-id=5fce507fd2bd8103886c1465-0-build%2F71C05C00) I saw t/full-stack.t failing with

RETRY=0 timeout -s SIGINT -k 5 -v $((9 * (0 + 1) ))m tools/retry prove -l --harness TAP::Harness::JUnit --timer t/full-stack.t
[16:26:38] t/full-stack.t .. 375/? # full result panel contents:
# State: scheduled
# Scheduled product: job has not been created by posting an ISO
# 50

#   Failed test 'test 1 is running'
#   at t/full-stack.t line 128.
Bailout called.  Further testing stopped:  URL for os-autoinst cmd srv not available

full log from background services available in https://app.circleci.com/pipelines/github/os-autoinst/openQA/5211/workflows/9d31f5c4-fd61-410e-890d-0006dfef245f/jobs/49513/artifacts

I think I have seen this in before.

Steps to reproduce

TBC

Expected result

t/full-stack.t should be stable, e.g. in 1000/1000 runs.

Suggestions


Related issues 1 (1 open0 closed)

Copied from openQA Project - action #72085: flaky/unstable t/05-scheduler-full.t, Failed test 'Allocated maximum number of jobs that could have been allocated'Workable

Actions
Actions #1

Updated by okurz over 3 years ago

  • Copied from action #72085: flaky/unstable t/05-scheduler-full.t, Failed test 'Allocated maximum number of jobs that could have been allocated' added
Actions #2

Updated by livdywan over 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to livdywan

I kind of started investigating this as a side-effect of #75232 because I was running and hacking this one. I'm not positive yet that I caught the exact same cause, but the symptom is the exact same one and I'm seeing if I can improve the failure a little bit.

Actions #3

Updated by openqa_review over 3 years ago

  • Due date set to 2021-01-27

Setting due date based on mean cycle time of SUSE QE Tools

Actions #4

Updated by livdywan about 3 years ago

okurz wrote:

  • Improve error output of full stack test, e.g. the message "waiting for worker to propagate URL for os-autoinst cmd srv" could be a bit more verbose and tell what it found and what it expects

https://github.com/os-autoinst/openQA/pull/3685

Actions #5

Updated by livdywan about 3 years ago

  • Status changed from In Progress to Feedback

I'll leave my small suggestions up for now and see if we can discuss them more but unassign since I can't reproduce and therefore fix the issue as such.

Actions #6

Updated by livdywan about 3 years ago

  • Status changed from Feedback to Workable
Actions #7

Updated by livdywan about 3 years ago

  • Assignee deleted (livdywan)
Actions #8

Updated by livdywan about 3 years ago

  • Due date deleted (2021-01-27)
Actions #9

Updated by mkittler about 3 years ago

  • Description updated (diff)
  • Assignee set to mkittler

Artifacts are only stored for 30 days by CircleCI. Please don't just add the links to the ticket as they become useless very soon.

Not sure what the context of "waiting for worker to propagate URL for os-autoinst cmd srv" was because the log file containing it doesn't exist anymore.

I can still try to reproduce it locally.

Actions #10

Updated by mkittler about 3 years ago

The next comment contains the log messages which turned out to be relevant.


I've ran into another issue locally but after fixing that I could actually reproduce this issue.

The test setup looks like this:

prove -l -v t/full-stack.t
t/full-stack.t .. 
# OPENQA_BASEDIR: /tmp/qwV5zLprfY/full-stack.d
# OPENQA_CONFIG: /tmp/qwV5zLprfY/full-stack.d/config
[INFO] using database schema "public"
[info] [pid:8421] setting database search path to public when registering Minion plugin
ok 1 - assets are prefetched
# Used ports: {}
# Starting WebSocket service. Port: 9527
# Bogus: 0 | No wait: 0
# Starting WebUI service. Port: 9526
# PID of openqa-websocket: 8441
# PID of openqa-webapi: 8442
[info] [pid:8442] setting database search path to public when registering Minion plugin
[info] Listening at "http://127.0.0.1:9527"
Web application available at http://127.0.0.1:9527
[debug] [sg2vn6Gi] GET "/"
[debug] [sg2vn6Gi] Routing to a callback
[debug] [sg2vn6Gi] Rendering template "main/index.html.ep"
[debug] [sg2vn6Gi] Rendering template "layouts/info.html.ep"
[debug] [sg2vn6Gi] Rendering template "branding/openSUSE/docbox.html.ep"
[debug] [sg2vn6Gi] Rendering template "branding/plain/docbox.html.ep"
[debug] [sg2vn6Gi] Rendering template "branding/openSUSE/sponsorbox.html.ep"
[debug] [sg2vn6Gi] Rendering template "layouts/bootstrap.html.ep"
[debug] [sg2vn6Gi] Rendering template "layouts/navbar.html.ep"
[debug] [sg2vn6Gi] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [sg2vn6Gi] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [sg2vn6Gi] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [sg2vn6Gi] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [sg2vn6Gi] 200 OK (0.031714s, 31.532/s)
[debug] [qf-4dpR3] GET "/asset/92081e1a5e/bootstrap.css"
[debug] [qf-4dpR3] Routing to a callback
[debug] [qf-4dpR3] 200 OK (0.00064s, 1562.500/s)
[debug] [dNkI8iNz] GET "/asset/a91164e414/bootstrap.js"
[debug] [dNkI8iNz] Routing to a callback
[debug] [dNkI8iNz] 200 OK (0.000646s, 1547.988/s)
[debug] [hXreA2Y2] GET "/asset/518553d3d2/logo.svg"
[debug] [hXreA2Y2] Routing to a callback
[debug] [hXreA2Y2] 200 OK (0.000848s, 1179.245/s)
[debug] [Z-QugcC3] GET "/asset/308c1fc4a9/suse.png"
[debug] [Z-QugcC3] Routing to a callback
[debug] [Z-QugcC3] 200 OK (0.000579s, 1727.116/s)
[debug] [5PqabyOA] GET "/dashboard_build_results"
[debug] [5PqabyOA] Routing to controller "OpenQA::WebAPI::Controller::Main" and action "dashboard_build_results"
[debug] [5PqabyOA] Rendering template "main/dashboard_build_results.html.ep"
[debug] [5PqabyOA] 200 OK (0.003011s, 332.116/s)
[debug] [xeXDm1HJ] GET "/asset/27c88aea39/fa-solid-900.woff2"
[debug] [xeXDm1HJ] Routing to a callback
[debug] [xeXDm1HJ] 200 OK (0.000631s, 1584.786/s)
ok 2 - resultdir "/tmp/qwV5zLprfY/full-stack.d/openqa/testresults" exists
# PID of openqa-livehandler: 8548
ok 3 - on main page
ok 4 - no one logged in
[debug] [djUy9X3V] GET "/login"
[debug] [djUy9X3V] Routing to controller "OpenQA::Shared::Controller::Session" and action "create"
[debug] [djUy9X3V] 302 Found (0.019874s, 50.317/s)
[debug] [TilCuOkv] GET "/"
[debug] [TilCuOkv] Routing to a callback
[debug] [TilCuOkv] Rendering cached template "main/index.html.ep"
[debug] [TilCuOkv] Rendering cached template "layouts/info.html.ep"
[debug] [TilCuOkv] Rendering cached template "branding/openSUSE/docbox.html.ep"
[debug] [TilCuOkv] Rendering cached template "branding/plain/docbox.html.ep"
[debug] [TilCuOkv] Rendering cached template "branding/openSUSE/sponsorbox.html.ep"
[debug] [TilCuOkv] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [TilCuOkv] Rendering cached template "layouts/navbar.html.ep"
[debug] [TilCuOkv] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [TilCuOkv] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [TilCuOkv] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [TilCuOkv] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [TilCuOkv] 200 OK (0.010606s, 94.286/s)
[debug] [dxxqWcUg] GET "/dashboard_build_results"
[debug] [dxxqWcUg] Routing to controller "OpenQA::WebAPI::Controller::Main" and action "dashboard_build_results"
[debug] [dxxqWcUg] Rendering cached template "main/dashboard_build_results.html.ep"
[debug] [dxxqWcUg] 200 OK (0.001134s, 881.834/s)
ok 5 - click_ok'Login clicked'
ok 6 - back on main page
ok 7 - click_ok'disable tour permanently'
[debug] [1UHaT_mT] POST "/api/v1/feature"
[debug] [1UHaT_mT] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth"
ok 8 - click_ok'confirm dismissing tour'
[debug] API auth by user: Demo, operator: 1
[debug] [1UHaT_mT] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Feature" and action "informed"
[debug] [1UHaT_mT] 200 OK (0.007431s, 134.571/s)
[debug] [Btb1XFTc] POST "/api/v1/jobs"
[debug] [Btb1XFTc] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth_operator"
[debug] API key from client: *1234567890ABCDEF*
[debug] Key is for user 'Demo'
[debug] API auth by user: Demo, operator: 1
[debug] [Btb1XFTc] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Job" and action "create"
[debug] [Btb1XFTc] 200 OK (0.036118s, 27.687/s)
[warn] [pid:8442] Unable to wakeup scheduler: Connection refused
[debug] [HI-G3u9K] GET "/tests"
[debug] [HI-G3u9K] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list"
[debug] [HI-G3u9K] Rendering template "test/list.html.ep"
[debug] [HI-G3u9K] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [HI-G3u9K] Rendering cached template "layouts/navbar.html.ep"
[debug] [HI-G3u9K] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [HI-G3u9K] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [HI-G3u9K] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [HI-G3u9K] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [HI-G3u9K] 200 OK (0.011185s, 89.405/s)
ok 9 - click_ok'Clicked All Tests to look for scheduled job'
[debug] [czXrIOZ4] GET "/tests/list_running_ajax"
[debug] [czXrIOZ4] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list_running_ajax"
[debug] [czXrIOZ4] 200 OK (0.01118s, 89.445/s)
[debug] [QWj52Qcc] GET "/tests/list_scheduled_ajax"
[debug] [QWj52Qcc] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list_scheduled_ajax"
[debug] [QWj52Qcc] 200 OK (0.005428s, 184.230/s)
[debug] [3lVf6Unc] GET "/tests/list_ajax"
[debug] [3lVf6Unc] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list_ajax"
[debug] [3lVf6Unc] 200 OK (0.015692s, 63.727/s)
[debug] [w9AQ_xcC] GET "/asset/e38e3130ae/fa-regular-400.woff2"
[debug] [w9AQ_xcC] Routing to a callback
[debug] [w9AQ_xcC] 200 OK (0.000714s, 1400.560/s)
ok 10 - tests followed
ok 11 - Wait for jQuery successful: wait before checking for scheduled job
[debug] [9k490l9k] GET "/tests/1"
[debug] [9k490l9k] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:8442] Unrecognized referer 'localhost'
[debug] [9k490l9k] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "show"
[debug] [9k490l9k] Rendering template "test/result.html.ep"
[debug] [9k490l9k] Rendering cached template "layouts/info.html.ep"
[debug] [9k490l9k] Rendering template "test/infopanel.html.ep"
[debug] [9k490l9k] Rendering template "test/job_next_previous.html.ep"
[debug] [9k490l9k] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [9k490l9k] Rendering cached template "layouts/navbar.html.ep"
[debug] [9k490l9k] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [9k490l9k] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [9k490l9k] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [9k490l9k] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [9k490l9k] 200 OK (0.047101s, 21.231/s)
[debug] [V-lWD3Xz] GET "/asset/191ae26f71/test_result.js"
[debug] [V-lWD3Xz] Routing to a callback
[debug] [V-lWD3Xz] 200 OK (0.000618s, 1618.123/s)
[debug] [HQtQFNzS] GET "/asset/9d0a23fccf/job_next_previous.js"
[debug] [HQtQFNzS] Routing to a callback
[debug] [HQtQFNzS] 200 OK (0.000456s, 2192.982/s)
[debug] [kEzXp2_m] GET "/tests/1/settings_ajax"
[debug] [kEzXp2_m] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:8442] Unrecognized referer 'localhost'
[debug] [kEzXp2_m] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "settings"
ok 12 - scheduled test page
[debug] [kEzXp2_m] Rendering template "test/settings.html.ep"
[debug] [kEzXp2_m] 200 OK (0.010431s, 95.868/s)
[debug] [Zx5REM5H] GET "/tests/1/status"
[debug] [Zx5REM5H] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:8442] Unrecognized referer 'localhost'
[debug] [Zx5REM5H] Routing to controller "OpenQA::WebAPI::Controller::Running" and action "status"
[debug] [Zx5REM5H] 200 OK (0.006048s, 165.344/s)

Note that [warn] [pid:10830] Unable to wakeup scheduler: Connection refused is also logged on good runs.

Then it starts the first tests:

ok 13 - test 1 is scheduled
ok 14 - no javascript warnings or errors after test 1 was scheduled
ok 15 - Worker started as IPC::Run=HASH(0x55c720dd51d8)
# Waiting for job to become available
[info] [pid:8562] worker 1:
 - config file:           /tmp/qwV5zLprfY/full-stack.d/config/workers.ini
 - worker hostname:       linux-9lzf
 - isotovideo version:    20
 - websocket API version: 1
 - web UI hosts:          http://localhost:9526
 - class:                 qemu_i386,qemu_x86_64
 - no cleanup:            no
 - pool directory:        /tmp/qwV5zLprfY/full-stack.d/openqa/pool/1
[info] [pid:8562] Project dir for host http://localhost:9526 is /tmp/qwV5zLprfY/full-stack.d/openqa/share
[info] [pid:8562] Registering with openQA http://localhost:9526
[debug] [sf7QT3F1] POST "/api/v1/workers"
[debug] [sf7QT3F1] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth_operator"
[debug] API key from client: *1234567890ABCDEF*
[debug] Key is for user 'Demo'
[debug] API auth by user: Demo, operator: 1
[debug] [sf7QT3F1] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Worker" and action "create"
[debug] [pid:8421] +=----------------=+
[debug] [pid:8421] -> Scheduling new jobs.
[debug] [pid:8421]       Free workers: 1/1
[debug] [pid:8421]       Scheduled jobs: 1
[debug] [pid:8421] Scheduler took 0.02375s to perform operations and allocated 0 jobs
ok 16 - Wait for jQuery successful: result panel shows '(?^:State: running)'
[debug] [sf7QT3F1] 200 OK (0.044851s, 22.296/s)
[info] [pid:8562] Establishing ws connection via ws://localhost:9526/api/v1/ws/1
[debug] [8UXw8Z5v] GET "/api/v1/ws/1"
[debug] [8UXw8Z5v] Routing to a callback
[debug] [8UXw8Z5v] 302 Found (0.00095s, 1052.632/s)
[debug] [NSaiuJ1m] GET "/ws/1"
[debug] [NSaiuJ1m] Routing to controller "OpenQA::Shared::Controller::Auth" and action "check"
[debug] [NSaiuJ1m] Routing to controller "OpenQA::WebSockets::Controller::Worker" and action "ws"
[debug] [NSaiuJ1m] 101 Switching Protocols (0.010989s, 91.000/s)
[info] [pid:8562] Registered and connected via websockets with openQA host http://localhost:9526 and worker ID 1
[debug] [pid:8441] Received from worker "1" worker_status message "{
  "status" => "free",
  "type" => "worker_status"
}
"
[debug] [pid:8441] Updating seen of worker 1 from worker_status
[debug] [pid:8441] Sent population to worker: { population => 1, type => "info" }
ok 17 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 18 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 19 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 20 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 21 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 22 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 23 - Wait for jQuery successful: result panel shows '(?^:State: running)'
[debug] [ZyEL06vP] GET "/tests/1/status"
[debug] [ZyEL06vP] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:8442] Unrecognized referer 'localhost'
[debug] [ZyEL06vP] Routing to controller "OpenQA::WebAPI::Controller::Running" and action "status"
[debug] [ZyEL06vP] 200 OK (0.013374s, 74.772/s)

It repeats this check until:

[debug] [pid:8441] Received from worker "1" worker_status message "{
  "status" => "free",
  "type" => "worker_status"
}
"
[debug] [pid:8441] Updating seen of worker 1 from worker_status
[debug] [5wZkuUDj] GET "/tests/1/status"
[debug] [5wZkuUDj] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:8442] Unrecognized referer 'localhost'
[debug] [pid:8441] Sent population to worker: { population => 1, type => "info" }
[debug] [5wZkuUDj] Routing to controller "OpenQA::WebAPI::Controller::Running" and action "status"
[debug] [5wZkuUDj] 200 OK (0.007515s, 133.067/s)
ok 364 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 365 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 366 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 367 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 368 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 369 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 370 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 371 - Wait for jQuery successful: result panel shows '(?^:State: running)'
[debug] [SyD32RgA] GET "/tests/1/status"
[debug] [SyD32RgA] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:8442] Unrecognized referer 'localhost'
[debug] [SyD32RgA] Routing to controller "OpenQA::WebAPI::Controller::Running" and action "status"
[debug] [SyD32RgA] 200 OK (0.012703s, 78.722/s)
ok 372 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 373 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 374 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 375 - Wait for jQuery successful: result panel shows '(?^:State: running)'
# full result panel contents:
# State: scheduled
# Scheduled product: job has not been created by posting an ISO
# 50
not ok 376 - test 1 is running

#   Failed test 'test 1 is running'
#   at t/full-stack.t line 125.
# Subtest: wait until developer console becomes available
[debug] [wdriZMjH] GET "/tests/1/developer/ws-console"
[debug] [wdriZMjH] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [wdriZMjH] Routing to controller "OpenQA::Shared::Controller::Session" and action "ensure_admin"
[debug] [wdriZMjH] Routing to controller "OpenQA::WebAPI::Controller::Developer" and action "ws_console"
[debug] [wdriZMjH] Rendering template "developer/ws_console.html.ep"
[debug] [wdriZMjH] Rendering cached template "layouts/info.html.ep"
[debug] [wdriZMjH] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [wdriZMjH] Rendering cached template "layouts/navbar.html.ep"
[debug] [wdriZMjH] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [wdriZMjH] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [wdriZMjH] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [wdriZMjH] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [wdriZMjH] 200 OK (0.014859s, 67.299/s)
[debug] [esFTBVco] GET "/asset/7431130272/ws_console.js"
[debug] [esFTBVco] Routing to a callback
[debug] [esFTBVco] 200 OK (0.000559s, 1788.909/s)
[debug] [DoJSHgT7] GET "/asset/bfaa1fb4a9/ws_console.css"
[debug] [DoJSHgT7] Routing to a callback
[debug] [DoJSHgT7] 200 OK (0.00047s, 2127.660/s)
    # Waiting for URL for os-autoinst cmd srv to become available
    # waiting for worker to propagate URL for os-autoinst cmd srv
[debug] [l4zPgM2n] GET "/tests/1/developer/ws-console"
[debug] [l4zPgM2n] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [l4zPgM2n] Routing to controller "OpenQA::Shared::Controller::Session" and action "ensure_admin"
[debug] [l4zPgM2n] Routing to controller "OpenQA::WebAPI::Controller::Developer" and action "ws_console"
[debug] [l4zPgM2n] Rendering cached template "developer/ws_console.html.ep"
[debug] [l4zPgM2n] Rendering cached template "layouts/info.html.ep"
[debug] [l4zPgM2n] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [l4zPgM2n] Rendering cached template "layouts/navbar.html.ep"
[debug] [l4zPgM2n] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [l4zPgM2n] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [l4zPgM2n] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [l4zPgM2n] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [l4zPgM2n] 200 OK (0.011366s, 87.982/s)
    ok 1 - Wait for jQuery successful: developer console available
    # waiting for worker to propagate URL for os-autoinst cmd srv

Then the log continues to repeat itself until:

    ok 60 - Wait for jQuery successful: developer console available
Bailout called.  Further testing stopped:  URL for os-autoinst cmd srv not available
Bail out!  URL for os-autoinst cmd srv not available
FAILED--Further testing stopped: URL for os-autoinst cmd srv not available
## count_fail_ratio: Run: 5. Fails: 1. Fail ratio 20%
## Run 6

A good run looks like this:

prove -l -v t/full-stack.t
t/full-stack.t .. 
# OPENQA_BASEDIR: /tmp/P91IKrUsaD/full-stack.d
# OPENQA_CONFIG: /tmp/P91IKrUsaD/full-stack.d/config
[INFO] using database schema "public"
[info] [pid:10811] setting database search path to public when registering Minion plugin
ok 1 - assets are prefetched
# Used ports: {}
# Starting WebSocket service. Port: 9527
# Bogus: 0 | No wait: 0
# Starting WebUI service. Port: 9526
# PID of openqa-websocket: 10829
# PID of openqa-webapi: 10830
[info] [pid:10830] setting database search path to public when registering Minion plugin
[info] Listening at "http://127.0.0.1:9527"
Web application available at http://127.0.0.1:9527
[debug] [USVkAAtY] GET "/"
[debug] [USVkAAtY] Routing to a callback
[debug] [USVkAAtY] Rendering template "main/index.html.ep"
[debug] [USVkAAtY] Rendering template "layouts/info.html.ep"
[debug] [USVkAAtY] Rendering template "branding/openSUSE/docbox.html.ep"
[debug] [USVkAAtY] Rendering template "branding/plain/docbox.html.ep"
[debug] [USVkAAtY] Rendering template "branding/openSUSE/sponsorbox.html.ep"
[debug] [USVkAAtY] Rendering template "layouts/bootstrap.html.ep"
[debug] [USVkAAtY] Rendering template "layouts/navbar.html.ep"
[debug] [USVkAAtY] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [USVkAAtY] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [USVkAAtY] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [USVkAAtY] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [USVkAAtY] 200 OK (0.038354s, 26.073/s)
[debug] [7wOne7DP] GET "/asset/92081e1a5e/bootstrap.css"
[debug] [7wOne7DP] Routing to a callback
[debug] [7wOne7DP] 200 OK (0.000634s, 1577.287/s)
[debug] [E31Jq_cF] GET "/asset/a91164e414/bootstrap.js"
[debug] [E31Jq_cF] Routing to a callback
[debug] [E31Jq_cF] 200 OK (0.000683s, 1464.129/s)
[debug] [rzyZOC0a] GET "/asset/518553d3d2/logo.svg"
[debug] [rzyZOC0a] Routing to a callback
[debug] [rzyZOC0a] 200 OK (0.000769s, 1300.390/s)
[debug] [AJ8guRAu] GET "/asset/308c1fc4a9/suse.png"
[debug] [AJ8guRAu] Routing to a callback
[debug] [AJ8guRAu] 200 OK (0.00083s, 1204.819/s)
[debug] [Mps56N1m] GET "/dashboard_build_results"
[debug] [Mps56N1m] Routing to controller "OpenQA::WebAPI::Controller::Main" and action "dashboard_build_results"
[debug] [Mps56N1m] Rendering template "main/dashboard_build_results.html.ep"
[debug] [Mps56N1m] 200 OK (0.002733s, 365.898/s)
[debug] [7nj7og9K] GET "/asset/27c88aea39/fa-solid-900.woff2"
[debug] [7nj7og9K] Routing to a callback
[debug] [7nj7og9K] 200 OK (0.000485s, 2061.856/s)
ok 2 - resultdir "/tmp/P91IKrUsaD/full-stack.d/openqa/testresults" exists
# PID of openqa-livehandler: 10925
ok 3 - on main page
ok 4 - no one logged in
[debug] [wLyqVPMc] GET "/login"
[debug] [wLyqVPMc] Routing to controller "OpenQA::Shared::Controller::Session" and action "create"
[debug] [wLyqVPMc] 302 Found (0.018786s, 53.231/s)
[debug] [mM-tv_-T] GET "/"
[debug] [mM-tv_-T] Routing to a callback
[debug] [mM-tv_-T] Rendering cached template "main/index.html.ep"
[debug] [mM-tv_-T] Rendering cached template "layouts/info.html.ep"
[debug] [mM-tv_-T] Rendering cached template "branding/openSUSE/docbox.html.ep"
[debug] [mM-tv_-T] Rendering cached template "branding/plain/docbox.html.ep"
[debug] [mM-tv_-T] Rendering cached template "branding/openSUSE/sponsorbox.html.ep"
[debug] [mM-tv_-T] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [mM-tv_-T] Rendering cached template "layouts/navbar.html.ep"
[debug] [mM-tv_-T] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [mM-tv_-T] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [mM-tv_-T] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [mM-tv_-T] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [mM-tv_-T] 200 OK (0.010243s, 97.628/s)
[debug] [5yF7uzuE] GET "/dashboard_build_results"
[debug] [5yF7uzuE] Routing to controller "OpenQA::WebAPI::Controller::Main" and action "dashboard_build_results"
[debug] [5yF7uzuE] Rendering cached template "main/dashboard_build_results.html.ep"
[debug] [5yF7uzuE] 200 OK (0.001599s, 625.391/s)
ok 5 - click_ok'Login clicked'
ok 6 - back on main page
ok 7 - click_ok'disable tour permanently'
[debug] [YGzDclGt] POST "/api/v1/feature"
[debug] [YGzDclGt] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth"
ok 8 - click_ok'confirm dismissing tour'
[debug] API auth by user: Demo, operator: 1
[debug] [YGzDclGt] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Feature" and action "informed"
[debug] [YGzDclGt] 200 OK (0.007435s, 134.499/s)
[debug] [ltQDsgwo] POST "/api/v1/jobs"
[debug] [ltQDsgwo] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth_operator"
[debug] API key from client: *1234567890ABCDEF*
[debug] Key is for user 'Demo'
[debug] API auth by user: Demo, operator: 1
[debug] [ltQDsgwo] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Job" and action "create"
[debug] [ltQDsgwo] 200 OK (0.034917s, 28.639/s)
[warn] [pid:10830] Unable to wakeup scheduler: Connection refused
[debug] [OiVuZawQ] GET "/tests"
[debug] [OiVuZawQ] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list"
[debug] [OiVuZawQ] Rendering template "test/list.html.ep"
[debug] [OiVuZawQ] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [OiVuZawQ] Rendering cached template "layouts/navbar.html.ep"
[debug] [OiVuZawQ] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [OiVuZawQ] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [OiVuZawQ] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [OiVuZawQ] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [OiVuZawQ] 200 OK (0.009945s, 100.553/s)
[debug] [_-PXVswH] GET "/tests/list_running_ajax"
[debug] [_-PXVswH] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list_running_ajax"
[debug] [_-PXVswH] 200 OK (0.0071s, 140.845/s)
[debug] [u3L-qS-O] GET "/tests/list_scheduled_ajax"
[debug] [u3L-qS-O] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list_scheduled_ajax"
[debug] [u3L-qS-O] 200 OK (0.003233s, 309.310/s)
[debug] [UcfJYk0k] GET "/tests/list_ajax"
[debug] [UcfJYk0k] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "list_ajax"
[debug] [UcfJYk0k] 200 OK (0.018081s, 55.307/s)
ok 9 - click_ok'Clicked All Tests to look for scheduled job'
[debug] [a9T5UGnY] GET "/asset/e38e3130ae/fa-regular-400.woff2"
[debug] [a9T5UGnY] Routing to a callback
[debug] [a9T5UGnY] 200 OK (0.000675s, 1481.481/s)
ok 10 - tests followed
ok 11 - Wait for jQuery successful: wait before checking for scheduled job
[debug] [pSPI-Slw] GET "/tests/1"
[debug] [pSPI-Slw] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [pSPI-Slw] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "show"
[debug] [pSPI-Slw] Rendering template "test/result.html.ep"
[debug] [pSPI-Slw] Rendering cached template "layouts/info.html.ep"
[debug] [pSPI-Slw] Rendering template "test/infopanel.html.ep"
[debug] [pSPI-Slw] Rendering template "test/job_next_previous.html.ep"
[debug] [pSPI-Slw] Rendering cached template "layouts/bootstrap.html.ep"
[debug] [pSPI-Slw] Rendering cached template "layouts/navbar.html.ep"
[debug] [pSPI-Slw] Template "branding/openSUSE/links_footer_left.html.ep" not found
[debug] [pSPI-Slw] Template "branding/plain/links_footer_left.html.ep" not found
[debug] [pSPI-Slw] Template "branding/openSUSE/links_footer_right.html.ep" not found
[debug] [pSPI-Slw] Template "branding/plain/links_footer_right.html.ep" not found
[debug] [pSPI-Slw] 200 OK (0.046771s, 21.381/s)
[debug] [d9FVKsUx] GET "/asset/191ae26f71/test_result.js"
[debug] [d9FVKsUx] Routing to a callback
[debug] [d9FVKsUx] 200 OK (0.000773s, 1293.661/s)
[debug] [CU1am6PS] GET "/asset/9d0a23fccf/job_next_previous.js"
[debug] [CU1am6PS] Routing to a callback
[debug] [CU1am6PS] 200 OK (0.000692s, 1445.087/s)
[debug] [TSdhiNik] GET "/tests/1/settings_ajax"
[debug] [TSdhiNik] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [TSdhiNik] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "settings"
ok 12 - scheduled test page
[debug] [TSdhiNik] Rendering template "test/settings.html.ep"
[debug] [TSdhiNik] 200 OK (0.012395s, 80.678/s)
[debug] [aT7TmajJ] GET "/tests/1/status"
[debug] [aT7TmajJ] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [aT7TmajJ] Routing to controller "OpenQA::WebAPI::Controller::Running" and action "status"
[debug] [aT7TmajJ] 200 OK (0.005329s, 187.652/s)
ok 13 - test 1 is scheduled
ok 14 - no javascript warnings or errors after test 1 was scheduled
ok 15 - Worker started as IPC::Run=HASH(0x55b0b18c2eb8)
# Waiting for job to become available
[info] [pid:10944] worker 1:
 - config file:           /tmp/P91IKrUsaD/full-stack.d/config/workers.ini
 - worker hostname:       linux-9lzf
 - isotovideo version:    20
 - websocket API version: 1
 - web UI hosts:          http://localhost:9526
 - class:                 qemu_i386,qemu_x86_64
 - no cleanup:            no
 - pool directory:        /tmp/P91IKrUsaD/full-stack.d/openqa/pool/1
[info] [pid:10944] Project dir for host http://localhost:9526 is /tmp/P91IKrUsaD/full-stack.d/openqa/share
[info] [pid:10944] Registering with openQA http://localhost:9526
[debug] [KNHkG1ZS] POST "/api/v1/workers"
[debug] [KNHkG1ZS] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth_operator"
[debug] API key from client: *1234567890ABCDEF*
[debug] Key is for user 'Demo'
[debug] API auth by user: Demo, operator: 1
[debug] [KNHkG1ZS] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Worker" and action "create"
[debug] [KNHkG1ZS] 200 OK (0.038105s, 26.243/s)
[info] [pid:10944] Establishing ws connection via ws://localhost:9526/api/v1/ws/1
[debug] [HgCW95vQ] GET "/api/v1/ws/1"
[debug] [HgCW95vQ] Routing to a callback
[debug] [HgCW95vQ] 302 Found (0.000876s, 1141.553/s)
[debug] [TpzCGPYr] GET "/ws/1"
[debug] [TpzCGPYr] Routing to controller "OpenQA::Shared::Controller::Auth" and action "check"
[debug] [TpzCGPYr] Routing to controller "OpenQA::WebSockets::Controller::Worker" and action "ws"
[debug] [TpzCGPYr] 101 Switching Protocols (0.010318s, 96.918/s)
[info] [pid:10944] Registered and connected via websockets with openQA host http://localhost:9526 and worker ID 1
[debug] [pid:10829] Received from worker "1" worker_status message "{
  "status" => "free",
  "type" => "worker_status"
}
"
[debug] [pid:10829] Updating seen of worker 1 from worker_status
[debug] [pid:10811] +=----------------=+
[debug] [pid:10811] -> Scheduling new jobs.
[debug] [pid:10811]      Free workers: 1/1
[debug] [pid:10829] Sent population to worker: { population => 1, type => "info" }
[debug] [pid:10811]      Scheduled jobs: 1
[debug] [pid:10811] Need to schedule 1 parallel jobs for job 1 (with priority 50)
[debug] [pid:10811] Assigned job '1' to worker ID '1'
[debug] [pid:10811] [Job#1] Prepare for being processed by worker 1
[debug] [xOVzP8ht] POST "/api/send_job"
[debug] [xOVzP8ht] Routing to controller "OpenQA::Shared::Controller::Auth" and action "check"
[debug] [xOVzP8ht] Routing to controller "OpenQA::WebSockets::Controller::API" and action "send_job"
[debug] [pid:10829] Started to send message to 1 for job(s) 1
[debug] [xOVzP8ht] 200 OK (0.001607s, 622.278/s)
[debug] [pid:10811] Sent job(s) '1' to worker '1'
[debug] [pid:10944] Accepting job 1 from http://localhost:9526.
[debug] [pid:10811] Scheduler took 0.05587s to perform operations and allocated 1 jobs
[debug] [pid:10944] Setting job 1 from http://localhost:9526 up
[debug] [pid:10944] Preparing Mojo::IOLoop::ReadWriteProcess::Session
[debug] [pid:10811] Allocated: { job => 1, worker => 1 }
[info] [pid:10944] +++ setup notes +++
[info] [pid:10944] Running on linux-9lzf:1 (Linux 5.10.12-1-default #1 SMP Sat Jan 30 19:15:49 UTC 2021 (a3c8888) x86_64)
[debug] [pid:10944] Job settings:
[debug] [pid:10944] 
    ARCH=i386
    BUILD=1
    CDMODEL=ide-cd
    DISTRI=tinycore
    FLAVOR=flavor
    HDDMODEL=ide-drive
    INTEGRATION_TESTS=1
    ISO=Core-7.2.iso
    JOBTOKEN=MlQq7aruTeKsj6Kd
    LOG_LEVEL=debug
    MACHINE=coolone
    NAME=00000001-tinycore-1-flavor-i386-Build1-core@coolone
    OPENQA_HOSTNAME=localhost:9526
    OPENQA_URL=http://localhost:9526
    PAUSE_AT=shutdown
    PRJDIR=/tmp/P91IKrUsaD/full-stack.d/openqa/share
    PUBLISH_HDD_1=core-hdd.qcow2
    QEMU=i386
    QEMUPORT=20012
    QEMU_NO_FDC_SET=1
    QEMU_NO_TABLET=1
    RETRY_DELAY=5
    RETRY_DELAY_IF_WEBUI_BUSY=60
    TEST=core
    UEFI_PFLASH_VARS=/usr/share/qemu/ovmf-x86_64.bin
    VERSION=1
    VNC=91
    WORKER_CLASS=qemu_i386,qemu_x86_64
    WORKER_HOSTNAME=127.0.0.1
    WORKER_ID=1
    WORKER_INSTANCE=1
[info] [pid:10944] Preparing cgroup to start isotovideo
[warn] [pid:10944] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd/openqa: Permission denied at /usr/lib/perl5/vendor_perl/5.32.0/Mojo/File.pm line 84.

[info] [pid:10944] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROUP_FS.
[info] [pid:10944] Starting isotovideo container
[debug] [pid:10944] Registered process:10952
[info] [pid:10952] 10952: WORKING 1
[debug] [pid:10952] +++ worker notes +++
[info] [pid:10944] isotovideo has been started (PID: 10952)
[debug] [pid:10944] Running job 1 from http://localhost:9526: 00000001-tinycore-1-flavor-i386-Build1-core@coolone.
ok 16 - Wait for jQuery successful: result panel shows '(?^:State: running)'
[debug] [pid:10944] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
[debug] [MjM647y1] POST "/api/v1/jobs/1/status"
[debug] [MjM647y1] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth_operator"
[debug] API key from client: *1234567890ABCDEF*
[debug] Key is for user 'Demo'
[debug] API auth by user: Demo, operator: 1
[debug] [MjM647y1] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Job" and action "update_status"
[debug] [pid:10829] Worker 1 accepted job 1
[debug] [pid:10829] Received from worker "1" worker_status message "{
  "current_webui_host" => "http://localhost:9526",
  "job" => {
    "URL" => "http://localhost:20013/MlQq7aruTeKsj6Kd",
    "assets" => {
      "iso" => [
        "Core-7.2.iso"
      ]
    },
    "assigned_worker_id" => 1,
    "blocked_by_id" => undef,
    "clone_id" => undef,
    "group_id" => undef,
    "id" => 1,
    "name" => "tinycore-1-flavor-i386-Build1-core\@coolone",
    "priority" => 50,
    "result" => "none",
    "settings" => {
      "ARCH" => "i386",
      "BUILD" => 1,
      "CDMODEL" => "ide-cd",
      "DISTRI" => "tinycore",
      "FLAVOR" => "flavor",
      "HDDMODEL" => "ide-drive",
      "INTEGRATION_TESTS" => 1,
      "ISO" => "Core-7.2.iso",
      "JOBTOKEN" => "MlQq7aruTeKsj6Kd",
      "LOG_LEVEL" => "debug",
      "MACHINE" => "coolone",
      "NAME" => "00000001-tinycore-1-flavor-i386-Build1-core\@coolone",
      "OPENQA_HOSTNAME" => "localhost:9526",
      "PAUSE_AT" => "shutdown",
      "PUBLISH_HDD_1" => "core-hdd.qcow2",
      "QEMU" => "i386",
      "QEMUPORT" => 20012,
      "QEMU_NO_FDC_SET" => 1,
      "QEMU_NO_TABLET" => 1,
      "RETRY_DELAY" => 5,
      "RETRY_DELAY_IF_WEBUI_BUSY" => 60,
      "TEST" => "core",
      "UEFI_PFLASH_VARS" => "/usr/share/qemu/ovmf-x86_64.bin",
      "VERSION" => 1,
      "VNC" => 91,
      "WORKER_CLASS" => "qemu_i386,qemu_x86_64",
      "WORKER_HOSTNAME" => "127.0.0.1"
    },
    "state" => "assigned",
    "t_finished" => undef,
    "t_started" => undef,
    "test" => "core"
  },
  "status" => "working",
  "type" => "worker_status"
}
"
[debug] [pid:10829] Updating seen of worker 1 from worker_status
[debug] [pid:10829] Found job 1 in DB from worker_status update sent by worker 1
[debug] [pid:10829] Received request has job id: 1
[debug] [MjM647y1] 200 OK (0.021039s, 47.531/s)
[debug] [pid:10829] Worker 1 for job 1 has token MlQq7aruTeKsj6Kd
[debug] [pid:10829] Received request has token: MlQq7aruTeKsj6Kd
[debug] [pid:10829] Sent population to worker: { population => 1, type => "info" }
ok 17 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 18 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 19 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 20 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 21 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 22 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 23 - Wait for jQuery successful: result panel shows '(?^:State: running)'
[debug] [z7vM-Gk_] GET "/tests/1/status"
[debug] [z7vM-Gk_] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [z7vM-Gk_] Routing to controller "OpenQA::WebAPI::Controller::Running" and action "status"
[debug] [z7vM-Gk_] 200 OK (0.007556s, 132.345/s)
[debug] [mappzYME] GET "/tests/1/live_ajax"
[debug] [mappzYME] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [mappzYME] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "live"
[debug] [mappzYME] Rendering template "test/live.html.ep"
[debug] [mappzYME] 200 OK (0.017205s, 58.123/s)
[debug] [OLg7s8NJ] GET "/tests/1/details_ajax"
[debug] [OLg7s8NJ] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [OLg7s8NJ] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "details"
[debug] [OLg7s8NJ] Rendering template "test/details.html.ep"
[debug] [OLg7s8NJ] Rendering template "branding/openSUSE/external_reporting.html.ep"
[debug] [T0ruFxyU] GET "/liveviewhandler/tests/1/developer/ws-proxy/status"
[debug] [T0ruFxyU] Routing to controller "OpenQA::LiveHandler::Controller::LiveViewHandler" and action "proxy_status"
[debug] [OLg7s8NJ] 200 OK (0.024325s, 41.110/s)
[debug] [pqcKysXx] GET "/tests/1/infopanel_ajax"
[debug] [pqcKysXx] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [pqcKysXx] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "infopanel"
[debug] [pqcKysXx] Rendering cached template "test/infopanel.html.ep"
[debug] [pid:10925] connecting to os-autoinst command server for job 1 at ws://127.0.0.1:20013/MlQq7aruTeKsj6Kd/ws
[debug] [T0ruFxyU] 101 Switching Protocols (0.023926s, 41.796/s)
[debug] [pqcKysXx] 200 OK (0.019809s, 50.482/s)
[debug] [Z-yHmwsP] GET "/tests/1/streaming"
[debug] [Z-yHmwsP] Routing to controller "OpenQA::WebAPI::Controller::Test" and action "referer_check"
[debug] [pid:10830] Unrecognized referer 'localhost'
[debug] [Z-yHmwsP] Routing to controller "OpenQA::WebAPI::Controller::Running" and action "streaming"
[debug] [pid:10830] Asking the worker to start providing livestream
[debug] [5V__w0HN] POST "/api/send_msg"
[debug] [5V__w0HN] Routing to controller "OpenQA::Shared::Controller::Auth" and action "check"
[debug] [5V__w0HN] Routing to controller "OpenQA::WebSockets::Controller::API" and action "send_msg"
[debug] [5V__w0HN] 200 OK (0.00124s, 806.452/s)
[debug] [pid:10944] Starting livelog
[debug] [pid:10944] REST-API call: POST http://localhost:9526/api/v1/jobs/1/status
[debug] [i90978MY] POST "/api/v1/jobs/1/status"
[debug] [i90978MY] Routing to controller "OpenQA::Shared::Controller::Auth" and action "auth_operator"
[debug] API key from client: *1234567890ABCDEF*
[debug] Key is for user 'Demo'
[debug] API auth by user: Demo, operator: 1
[debug] [i90978MY] Routing to controller "OpenQA::WebAPI::Controller::API::V1::Job" and action "update_status"
[debug] [i90978MY] 200 OK (0.027066s, 36.947/s)
ok 24 - Wait for jQuery successful: result panel shows '(?^:State: running)'
ok 25 - test 1 is running
…
Actions #11

Updated by mkittler about 3 years ago

Just for the record: The difference between the good and failed runs is obviously that the scheduler does not assign the job to the worker in the failed runs although the free worker and the scheduler job seem to be acknowledged by the scheduler:

[debug] [pid:8421] +=----------------=+
[debug] [pid:8421] -> Scheduling new jobs.
[debug] [pid:8421]       Free workers: 1/1
[debug] [pid:8421]       Scheduled jobs: 1
[debug] [pid:8421] Scheduler took 0.02375s to perform operations and allocated 0 jobs

I've also seen another occurrence within the CI now and it looks exactly the same.

In the good case it looks like this:

[debug] [pid:10811] +=----------------=+
[debug] [pid:10811] -> Scheduling new jobs.
[debug] [pid:10811]      Free workers: 1/1
[debug] [pid:10829] Sent population to worker: { population => 1, type => "info" }
[debug] [pid:10811]      Scheduled jobs: 1
[debug] [pid:10811] Need to schedule 1 parallel jobs for job 1 (with priority 50)
[debug] [pid:10811] Assigned job '1' to worker ID '1'
Actions #12

Updated by mkittler about 3 years ago

The problem also affects 43-scheduling-and-worker-scalability.t where it is much easier to reproduce, e.g.:

time env runs=400 OPENQA_CHECKOUT=openQA-2 SCALABILITY_TEST=1 SCALABILITY_TEST_JOB_COUNT=1 SCALABILITY_TEST_WORKER_COUNT=1 "$OPENQA_BASEDIR/repos/okurz-github-scripts/count_fail_ratio" openqa-test t/43-scheduling-and-worker-scalability.t

The 400 runs are not really required; at some point the error ratio was even > 30 %.

So in the simple scenario where we have one job and one free worker the scheduler is not able to assign the job all the time.

Actions #13

Updated by mkittler about 3 years ago

  • Status changed from Workable to In Progress

Unfortunately the issue I saw within the stability test is a different one, see https://github.com/os-autoinst/openQA/pull/3714.

Actions #14

Updated by mkittler about 3 years ago

With more verbose output the worker information of a good run looks like that:

[debug] [pid:20004] {
  alive => 1,
  connected => 1,
  error => undef,
  host => "linux-9lzf",
  id => 1,
  instance => 1,
  properties => {
    CPU_ARCH => "x86_64",
    CPU_FLAGS => "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d",
    CPU_MODELNAME => "Intel(R) Core(TM) i7-6820HQ CPU \@ 2.70GHz",
    CPU_OPMODE => "32-bit, 64-bit",
    ISOTOVIDEO_INTERFACE_VERSION => 20,
    MEM_MAX => 31947,
    WEBSOCKET_API_VERSION => 1,
    WORKER_CLASS => "qemu_i386,qemu_x86_64",
  },
  status => "idle",
  websocket => 0,
}

In case of a bad run important information like the WORKER_CLASS is missing:

[debug] [pid:2937] {
  alive => 1,
  connected => 1,
  error => undef,
  host => "linux-9lzf",
  id => 1,
  instance => 1,
  properties => {
    CPU_MODELNAME => "Intel(R) Core(TM) i7-6820HQ CPU \@ 2.70GHz",
    MEM_MAX => 31947,
    WEBSOCKET_API_VERSION => 1,
  },
  status => "idle",
  websocket => 0,
}

That explains why the scheduler doesn't consider the job. Either the worker fails to pass the information entirely or the test doesn't wait correctly and sometimes the WORKER_CLASS is just not passed fast enough.

Actions #15

Updated by openqa_review about 3 years ago

  • Due date set to 2021-02-24

Setting due date based on mean cycle time of SUSE QE Tools

Actions #16

Updated by mkittler about 3 years ago

PR which should fix the actual problem (see last commit): https://github.com/os-autoinst/openQA/pull/3714

So far it seems to work. (I'm still running the test locally within a loop.)

Actions #17

Updated by mkittler about 3 years ago

  • Status changed from In Progress to Resolved

The PR has been merged and at least judging by my local tests it fixes the problem. We can re-open the ticket when it fails within the CI after all (in a way related to this ticket).

Actions

Also available in: Atom PDF