Project

General

Profile

Actions

action #62015

closed

jobs incomplete without logs as some workers are rejected (was: Scheduler does not work)

Added by nadvornik over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Support
Target version:
Start date:
2020-01-10
Due date:
% Done:

0%

Estimated time:

Description

After update to openQA-4.6.1578604167.a65685352-2147.1 I have random problem with scheduling.

journal contains this:

Jan 10 16:22:19 sleposbuilder2 worker[18329]: [info] 28043: WORKING 70
Jan 10 16:22:19 sleposbuilder2 openqa-websockets[17154]: [warn] Worker 4 accepted job 70 which was never assigned to it or has already finished
Jan 10 16:22:19 sleposbuilder2 openqa[17156]: [info] Got status update for job 70 with unexpected worker ID 4 (expected 2, job is scheduled)
Jan 10 16:22:19 sleposbuilder2 worker[18329]: [error] REST-API error (POST http://sleposbuilder2.suse.cz/api/v1/jobs/70/status): 400 response: Got status update for job 70 with unexpected worker ID 4 (expected 2
, job is scheduled) (remaining tries: 2)

then the job fails.

Postgres log contains this:
2020-01-10 16:22:19.831 CET openqa geekotest [17396]ERROR: duplicate key value violates unique constraint "workers_job_id"
2020-01-10 16:22:19.831 CET openqa geekotest [17396]DETAIL: Key (job_id)=(70) already exists.
2020-01-10 16:22:19.831 CET openqa geekotest [17396]STATEMENT: UPDATE workers SET job_id = $1, t_updated = $2 WHERE ( id = $3 )

Sometimes the same tests work fine.

Full logs attached


Files

logs.tgz (35.4 KB) logs.tgz logs nadvornik, 2020-01-10 15:32
log.gz (99.5 KB) log.gz logs nadvornik, 2020-01-13 12:56

Related issues 5 (0 open5 closed)

Related to openQA Project - action #37638: Flaky fullstack test: 'Test 3 is scheduled' at t/full-stack.tResolvedokurz2018-06-21

Actions
Related to openQA Project - action #59043: Fix unstable/flaky full-stack test, i.e. remove sleep, and ui testsResolvedokurz2019-11-04

Actions
Related to openQA Project - action #59984: unstable test: t/05-scheduler-full.tResolvedokurz2019-11-18

Actions
Related to openQA Project - action #62243: After latest updates, openQA has problematic behavior on Dell Precision 5810Resolvedokurz2020-01-17

Actions
Related to openQA Project - action #62984: Fix problem with job-worker assignment resulting in API errorsResolvedmkittler2020-02-03

Actions
Actions

Also available in: Atom PDF