Project

General

Profile

Actions

action #43991

closed

Scheduler stays busy after restarting/clonging a job

Added by mkittler about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2018-11-19
Due date:
% Done:

0%

Estimated time:

Description

After restarting or cloning a job, the scheduler stays busy. It does not matter whether the job actually gets executed or just stays in the state scheduled because no worker is available.

Restarting the scheduler helps. When there are scheduled jobs at the point the scheduler is restarted and then a worker becomes available and starts running a job, the issue isn't triggered. So it is apparently only adding new job to the list of scheduled jobs but not when a scheduled job is picked to be executed.

Setting OPENQA_SCHEDULER_SCHEDULE_TICK_MS to a high value doesn't help.

The strace output for the busy scheduler looks like this:

getpid()                                = 28160
sendto(5, "Q\0\0\0&SELECT COUNT( * ) FROM work"..., 39, MSG_NOSIGNAL, NULL, 0) = 39
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 16384, 0, NULL, NULL) = 63
getpid()                                = 28160
getpid()                                = 28160
sendto(5, "Q\0\0\0\213SELECT me.id, me.host, me.i"..., 140, MSG_NOSIGNAL, NULL, 0) = 140
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\300\0\7id\0\0\0Di\0\1\0\0\0\27\0\4\377\377\377\377\0\0host"..., 16384, 0, NULL, NULL) = 765
select(8, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)

(continues perpetually)

I suspect the output corresponds to the following lines of Perl code in Scheduler.pm:

my $all_workers = schema->resultset("Workers")->count();

my @f_w = grep { !$_->dead && ($_->websocket_api_version() || 0) == WEBSOCKET_API_VERSION }
    schema->resultset("Workers")->search({job_id => undef})->all();
Actions #1

Updated by mkittler about 6 years ago

  • Assignee set to mkittler
  • Target version set to Ready
Actions #2

Updated by mkittler about 6 years ago

  • Status changed from New to In Progress
Actions #3

Updated by mkittler about 6 years ago

  • Status changed from In Progress to Resolved
  • Target version changed from Ready to Done

PR has been merged

Actions

Also available in: Atom PDF