The failure of the fullstack test could be a problem with the worker websocket connection. The problem is that the worker disconnects in the critical moment when the job is being assigned:
# Subtest: Cache tests
ok 1 - cache.sqlite is not present
# Starting cache minion worker
# Waiting for 'cache service at http://127.0.0.1:9530' to become available (timeout: 60)
# Starting worker cache service on port 9530
# Waiting for 'cache service worker' to become available (timeout: 60)
[debug] [pid:367] Unable to wakeup scheduler: Connection refused. Retry scheduled
ok 2 - test 5 is scheduled
ok 3 - Worker started as IPC::Run=HASH(0x55edc96d1d28)
ok 4 - job with worker class qemu_i386 scheduled
[info] [pid:817] worker 1:
- config file: /home/squamata/project/test-results/fullstack/full-stack.d/config/workers.ini
- name used to register: 372c80df1da0
- worker address (WORKER_HOSTNAME): localhost
- isotovideo version: 40
- websocket API version: 1
- web UI hosts: http://localhost:9526
- class: qemu_i386,qemu_x86_64
- no cleanup: no
- pool directory: /home/squamata/project/test-results/fullstack/full-stack.d/openqa/pool/1
[info] [pid:817] Project dir for host http://localhost:9526 is /home/squamata/project/test-results/fullstack/full-stack.d/openqa/share
[info] [pid:817] Registering with openQA http://localhost:9526
ok 5 - at least one free worker with class qemu_i386 registered
[debug] [pid:351] Scheduling: Free workers: 1/1; Scheduled jobs: 1
[info] [pid:817] Establishing ws connection via ws://localhost:9526/api/v1/ws/1
[debug] [pid:351] Need to schedule 1 parallel jobs for job 5 (with priority 50)
[debug] [pid:351] limit reached, scheduling no additional jobs (max_running_jobs=-1, free workers=1, running=0, allocated=1)
[debug] [pid:351] Assigned job '5' to worker ID '1'
[debug] [pid:351] [Job#5] Prepare for being processed by worker 1
[warn] [pid:351] Failed sending job(s) '5' to worker '1': Unable to assign job to worker 1: the worker is not connected anymore
[info] [pid:817] Registered and connected via websockets with openQA host http://localhost:9526 and worker ID 1
[debug] [pid:366] Updating seen of worker 1 from worker_status (free)
[debug] [pid:351] Job 5 reset to state scheduled
[debug] [pid:351] Scheduler took 0.11739s to perform operations and allocated 0 jobs
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 0 of 3240)
…
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 269 of 3240)
[debug] [pid:366] Updating seen of worker 1 from worker_status (free)
[debug] [pid:366] Rescheduling jobs assigned to worker 1
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 270 of 3240)
…
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 637 of 3240)
# Wait for jQuery successful: wait_for_re[debug] [pid:366] Updating seen of worker 1 from worker_status (free)
[debug] [pid:366] Rescheduling jobs assigned to worker 1sult_panel: waiting for '(?^u:State: running)' (count 638 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 639 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 640 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 641 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 642 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 643 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 644 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 645 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 646 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 647 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 648 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 649 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 650 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 651 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 652 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 653 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 654 of 3240)
# Wait for jQuery successful: wait_for_re[debug] [pid:366] Updating seen of worker 1 from worker_status (free)
[debug] [pid:366] Rescheduling jobs assigned to worker 1sult_panel: waiting for '(?^u:State: running)' (count 655 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 656 of 3240)
…
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 3238 of 3240)
# Wait for jQuery successful: wait_for_result_panel: waiting for '(?^u:State: running)' (count 3239 of 3240)
ok 6 - No unexpected js warnings
not ok 7 - Expected result not found
not ok 8 - job 5 running
# job info: {
# 'blocked_by_id' => undef,
# 'clone_id' => undef,
# 'group_id' => undef,
# 'id' => 5,
# 'name' => 'tinycore-1-flavor-i386-Build1-core@coolone',
# 'priority' => 50,
# 'result' => 'none',
# 'settings' => {
# 'ARCH' => 'i386',
# 'BUILD' => '1',
# 'DISTRI' => 'tinycore',
# 'FLAVOR' => 'flavor',
# 'INTEGRATION_TESTS' => '1',
# 'ISO' => 'Core-7.2.iso',
# 'MACHINE' => 'coolone',
# 'NAME' => '00000005-tinycore-1-flavor-i386-Build1-core@coolone',
# 'PUBLISH_HDD_1' => '',
# 'QEMU' => 'i386',
# 'QEMU_NO_FDC_SET' => '1',
# 'QEMU_NO_KVM' => '1',
# 'QEMU_NO_TABLET' => '1',
# 'TEST' => 'core',
# 'UEFI_PFLASH_VARS' => '/usr/share/qemu/ovmf-x86_64.bin',
# 'VERSION' => '1',
# 'WORKER_CLASS' => 'qemu_i386'
# },
# 'state' => 'scheduled',
# 't_finished' => undef,
# 't_started' => undef,
# 'test' => 'core'
# }
# Subtest: results of test 5
…
Not sure why the scheduler/ws-server cannot just do the assignment on the next tick. The worker is at least updating its status and considered free.
EDIT: Looks like the scheduling in the fullstack test happens explicitly (the scheduler is not constantly running in the background). So we start the worker, don' wait that it is actually ready and then try to assign jobs. That is a recipe for failures like this and I'm surprised we don't see such failures more often.