action #43991: Scheduler stays busy after restarting/clonging a job - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #43991

closed

Scheduler stays busy after restarting/clonging a job

Added by mkittler over 6 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Regressions/Crashes

Target version:

Done

Start date:

2018-11-19

Due date:

% Done:

Estimated time:

Description

After restarting or cloning a job, the scheduler stays busy. It does not matter whether the job actually gets executed or just stays in the state scheduled because no worker is available.

Restarting the scheduler helps. When there are scheduled jobs at the point the scheduler is restarted and then a worker becomes available and starts running a job, the issue isn't triggered. So it is apparently only adding new job to the list of scheduled jobs but not when a scheduled job is picked to be executed.

Setting OPENQA_SCHEDULER_SCHEDULE_TICK_MS to a high value doesn't help.

The strace output for the busy scheduler looks like this:

getpid()                                = 28160
sendto(5, "Q\0\0\0&SELECT COUNT( * ) FROM work"..., 39, MSG_NOSIGNAL, NULL, 0) = 39
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 16384, 0, NULL, NULL) = 63
getpid()                                = 28160
getpid()                                = 28160
sendto(5, "Q\0\0\0\213SELECT me.id, me.host, me.i"..., 140, MSG_NOSIGNAL, NULL, 0) = 140
poll([{fd=5, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "T\0\0\0\300\0\7id\0\0\0Di\0\1\0\0\0\27\0\4\377\377\377\377\0\0host"..., 16384, 0, NULL, NULL) = 765
select(8, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)

(continues perpetually)

I suspect the output corresponds to the following lines of Perl code in Scheduler.pm:

my $all_workers = schema->resultset("Workers")->count();

my @f_w = grep { !$_->dead && ($_->websocket_api_version() || 0) == WEBSOCKET_API_VERSION }
    schema->resultset("Workers")->search({job_id => undef})->all();