action #32851

[tools][EPIC] Scheduling redesign

Added by EDiGiacinto about 2 years ago. Updated 1 day ago.

Status:ResolvedStart date:05/05/2018
Priority:NormalDue date:
Assignee:okurz% Done:

100%

Category:Feature requests
Target version:-
Difficulty:
Duration:

Description

Currently we are scheduling using DB with specific crafted query with the ORM - which is a consuming process both in terms of CPU and memory, even refining furthermore the query could be in long-term a dead end, and being problematic as we might have more requirements from it.

This ticket is meant just as a tracker to group refactorization/enhancements, redesign proposals.


Subtasks

action #12876: [epic] Offer a way for jobs to dynamically schedule childrenRejectedokurz

action #32725: [tools] Scheduler job_grab/filter_jobs refactoringResolved

action #27454: [tools][scheduling] Worker's seen DB field is ignored by ...Resolvedmkittler


Related issues

Related to openQA Project - action #20812: Jobs will be assigned to workers with wrong arch unless W... Resolved
Related to openQA Project - action #25970: Profile/Optimize _workers_checker in WebSockets server Resolved 11/10/2017
Related to openQA Project - action #28714: [tools] Investigate why sporadically job is set to scalar... Resolved 01/12/2017
Related to openQA Project - action #31069: Job life cycle not always covered by events Resolved 30/01/2018
Related to openQA Project - action #25124: [tools][sprint 201709.1] Workers disconnects from websock... Resolved 08/09/2017
Related to openQA Project - action #35296: Error messages on worker about "Use of uninitialized valu... Rejected 20/04/2018
Related to openQA Project - action #36727: job_grab does not cope with parallel cycles Resolved 04/06/2018
Follows openQA Project - action #35914: Changes to Job::duplicate Resolved 04/05/2018

History

#1 Updated by EDiGiacinto about 2 years ago

My 2c With regards to replacing DB, and doing it in memory - if AMQP is not a way to go (so, that means also dispatching jobs over ws would be replaced) - i would explore the possibility to switch to a SAT solving mechanism instead, avoiding to hard-code condition ourselves in the future. As i see it, we can re-formulate our problem as conditions that can be nicely expressed in CNF.

#2 Updated by coolo about 2 years ago

Don't! The problem is way too simple for such a monster solution

#3 Updated by dasantiago about 2 years ago

  • Related to action #20812: Jobs will be assigned to workers with wrong arch unless WORKER_CLASS is set somewhere added

#4 Updated by EDiGiacinto about 2 years ago

  • Related to action #25970: Profile/Optimize _workers_checker in WebSockets server added

#5 Updated by EDiGiacinto about 2 years ago

  • Related to action #28714: [tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself. added

#6 Updated by EDiGiacinto about 2 years ago

coolo wrote:

Don't! The problem is way too simple for such a monster solution

Well, it seems growing in complexity now, so maybe a simple solution is not enough anymore - and it might actually help slim the logic, as we could infer CNFs from job settings.

Not saying that is the road to hit - just worth mentioning the possibilities.

#7 Updated by EDiGiacinto about 2 years ago

  • Related to action #31069: Job life cycle not always covered by events added

#9 Updated by EDiGiacinto about 2 years ago

  • Related to action #25124: [tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever added

#10 Updated by EDiGiacinto almost 2 years ago

  • Related to action #35296: Error messages on worker about "Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 359, <GEN298662> line 4." added

#11 Updated by szarate almost 2 years ago

  • Due date set to 05/05/2018

due to changes in a related task

#12 Updated by szarate almost 2 years ago

#13 Updated by EDiGiacinto almost 2 years ago

  • Related to action #36727: job_grab does not cope with parallel cycles added

#14 Updated by okurz 10 months ago

  • Category changed from 122 to Feature requests

#15 Updated by okurz 3 days ago

  • Status changed from New to Resolved
  • Assignee set to okurz

The one open subtask #12876 is still a valid feature request but I doubt we need this top-level tracker as it does not provide more details.

Also available in: Atom PDF