action #43991
closedScheduler stays busy after restarting/clonging a job
Description
After restarting or cloning a job, the scheduler stays busy. It does not matter whether the job actually gets executed or just stays in the state scheduled because no worker is available.
Restarting the scheduler helps. When there are scheduled jobs at the point the scheduler is restarted and then a worker becomes available and starts running a job, the issue isn't triggered. So it is apparently only adding new job to the list of scheduled jobs but not when a scheduled job is picked to be executed.
Setting OPENQA_SCHEDULER_SCHEDULE_TICK_MS
to a high value doesn't help.
The strace output for the busy scheduler looks like this:
getpid() = 28160
sendto(5, "Q\0\0\0&SELECT COUNT( * ) FROM work"..., 39, MSG_NOSIGNAL, NULL, 0) = 39
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 16384, 0, NULL, NULL) = 63
getpid() = 28160
getpid() = 28160
sendto(5, "Q\0\0\0\213SELECT me.id, me.host, me.i"..., 140, MSG_NOSIGNAL, NULL, 0) = 140
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\300\0\7id\0\0\0Di\0\1\0\0\0\27\0\4\377\377\377\377\0\0host"..., 16384, 0, NULL, NULL) = 765
select(8, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)
(continues perpetually)
I suspect the output corresponds to the following lines of Perl code in Scheduler.pm
:
my $all_workers = schema->resultset("Workers")->count();
my @f_w = grep { !$_->dead && ($_->websocket_api_version() || 0) == WEBSOCKET_API_VERSION }
schema->resultset("Workers")->search({job_id => undef})->all();
Updated by mkittler about 6 years ago
- Assignee set to mkittler
- Target version set to Ready
Updated by mkittler about 6 years ago
- Status changed from New to In Progress
Updated by mkittler about 6 years ago
- Status changed from In Progress to Resolved
- Target version changed from Ready to Done
PR has been merged