action #43991

Scheduler stays busy after restarting/clonging a job

Added by mkittler over 1 year ago. Updated over 1 year ago.

Status:ResolvedStart date:19/11/2018
Priority:NormalDue date:
Assignee:mkittler% Done:

0%

Category:Concrete Bugs
Target version:Done
Difficulty:medium
Duration:

Description

After restarting or cloning a job, the scheduler stays busy. It does not matter whether the job actually gets executed or just stays in the state scheduled because no worker is available.

Restarting the scheduler helps. When there are scheduled jobs at the point the scheduler is restarted and then a worker becomes available and starts running a job, the issue isn't triggered. So it is apparently only adding new job to the list of scheduled jobs but not when a scheduled job is picked to be executed.

Setting OPENQA_SCHEDULER_SCHEDULE_TICK_MS to a high value doesn't help.

The strace output for the busy scheduler looks like this:

getpid()                                = 28160
sendto(5, "Q\0\0\0&SELECT COUNT( * ) FROM work"..., 39, MSG_NOSIGNAL, NULL, 0) = 39
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 16384, 0, NULL, NULL) = 63
getpid()                                = 28160
getpid()                                = 28160
sendto(5, "Q\0\0\0\213SELECT me.id, me.host, me.i"..., 140, MSG_NOSIGNAL, NULL, 0) = 140
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\300\0\7id\0\0\0Di\0\1\0\0\0\27\0\4\377\377\377\377\0\0host"..., 16384, 0, NULL, NULL) = 765
select(8, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)

(continues perpetually)

I suspect the output corresponds to the following lines of Perl code in Scheduler.pm:

my $all_workers = schema->resultset("Workers")->count();

my @f_w = grep { !$_->dead && ($_->websocket_api_version() || 0) == WEBSOCKET_API_VERSION }
    schema->resultset("Workers")->search({job_id => undef})->all();

History

#1 Updated by mkittler over 1 year ago

  • Assignee set to mkittler
  • Target version set to Ready

#2 Updated by mkittler over 1 year ago

  • Status changed from New to In Progress

#3 Updated by mkittler over 1 year ago

  • Status changed from In Progress to Resolved
  • Target version changed from Ready to Done

PR has been merged

Also available in: Atom PDF