Project

General

Profile

coordination #32851

[tools][EPIC] Scheduling redesign

Added by EDiGiacinto over 3 years ago. Updated 12 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2018-05-05
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

Currently we are scheduling using DB with specific crafted query with the ORM - which is a consuming process both in terms of CPU and memory, even refining furthermore the query could be in long-term a dead end, and being problematic as we might have more requirements from it.

This ticket is meant just as a tracker to group refactorization/enhancements, redesign proposals.


Subtasks

coordination #12876: [epic] Offer a way for jobs to dynamically schedule childrenRejectedokurz

action #32725: [tools] Scheduler job_grab/filter_jobs refactoringResolved

action #27454: [tools][scheduling] Worker's seen DB field is ignored by WebSocket server when checking for stale jobsResolvedmkittler


Related issues

Related to openQA Project - action #20812: Jobs will be assigned to workers with wrong arch unless WORKER_CLASS is set somewhereResolved

Related to openQA Project - action #25970: Profile/Optimize _workers_checker in WebSockets serverResolved2017-10-11

Related to openQA Project - action #28714: [tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself.Resolved2017-12-01

Related to openQA Project - action #31069: Job life cycle not always covered by eventsResolved2018-01-30

Related to openQA Project - action #25124: [tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' foreverResolved2017-09-08

Related to openQA Project - action #35296: Error messages on worker about "Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 359, <GEN298662> line 4."Rejected2018-04-20

Related to openQA Project - action #36727: job_grab does not cope with parallel cyclesResolved2018-06-04

Follows openQA Project - action #35914: Changes to Job::duplicateResolved2018-05-04

History

#1 Updated by EDiGiacinto over 3 years ago

My 2c With regards to replacing DB, and doing it in memory - if AMQP is not a way to go (so, that means also dispatching jobs over ws would be replaced) - i would explore the possibility to switch to a SAT solving mechanism instead, avoiding to hard-code condition ourselves in the future. As i see it, we can re-formulate our problem as conditions that can be nicely expressed in CNF.

#2 Updated by coolo over 3 years ago

Don't! The problem is way too simple for such a monster solution

#3 Updated by dasantiago over 3 years ago

  • Related to action #20812: Jobs will be assigned to workers with wrong arch unless WORKER_CLASS is set somewhere added

#4 Updated by EDiGiacinto over 3 years ago

  • Related to action #25970: Profile/Optimize _workers_checker in WebSockets server added

#5 Updated by EDiGiacinto over 3 years ago

  • Related to action #28714: [tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself. added

#6 Updated by EDiGiacinto over 3 years ago

coolo wrote:

Don't! The problem is way too simple for such a monster solution

Well, it seems growing in complexity now, so maybe a simple solution is not enough anymore - and it might actually help slim the logic, as we could infer CNFs from job settings.

Not saying that is the road to hit - just worth mentioning the possibilities.

#7 Updated by EDiGiacinto over 3 years ago

  • Related to action #31069: Job life cycle not always covered by events added

#9 Updated by EDiGiacinto over 3 years ago

  • Related to action #25124: [tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever added

#10 Updated by EDiGiacinto over 3 years ago

  • Related to action #35296: Error messages on worker about "Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 359, <GEN298662> line 4." added

#11 Updated by szarate over 3 years ago

  • Due date set to 2018-05-05

due to changes in a related task

#12 Updated by szarate over 3 years ago

#13 Updated by EDiGiacinto over 3 years ago

  • Related to action #36727: job_grab does not cope with parallel cycles added

#14 Updated by okurz over 2 years ago

  • Category changed from 122 to Feature requests

#15 Updated by okurz over 1 year ago

  • Status changed from New to Resolved
  • Assignee set to okurz

The one open subtask #12876 is still a valid feature request but I doubt we need this top-level tracker as it does not provide more details.

#16 Updated by szarate 12 months ago

  • Tracker changed from action to coordination

Also available in: Atom PDF