Project

General

Profile

Actions

coordination #32851

closed

[tools][EPIC] Scheduling redesign

Added by EDiGiacinto about 6 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2018-05-05
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Currently we are scheduling using DB with specific crafted query with the ORM - which is a consuming process both in terms of CPU and memory, even refining furthermore the query could be in long-term a dead end, and being problematic as we might have more requirements from it.

This ticket is meant just as a tracker to group refactorization/enhancements, redesign proposals.


Subtasks 3 (0 open3 closed)

coordination #12876: [epic] Offer a way for jobs to dynamically schedule childrenRejectedokurz2018-05-05

Actions
action #32725: [tools] Scheduler job_grab/filter_jobs refactoringResolved2018-05-05

Actions
action #27454: [tools][scheduling] Worker's seen DB field is ignored by WebSocket server when checking for stale jobsResolvedmkittler2018-05-05

Actions

Related issues 8 (0 open8 closed)

Related to openQA Project - action #20812: Jobs will be assigned to workers with wrong arch unless WORKER_CLASS is set somewhereResolvedmkittler

Actions
Related to openQA Project - action #25970: Profile/Optimize _workers_checker in WebSockets serverResolved2017-10-11

Actions
Related to openQA Project - action #28714: [tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself.Resolvedmkittler2017-12-01

Actions
Related to openQA Project - action #31069: Job life cycle not always covered by eventsResolved2018-01-30

Actions
Related to openQA Project - action #25124: [tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' foreverResolvedEDiGiacinto2017-09-08

Actions
Related to openQA Project - action #35296: Error messages on worker about "Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 359, <GEN298662> line 4."Rejected2018-04-20

Actions
Related to openQA Project - action #36727: job_grab does not cope with parallel cyclesResolvedszarate2018-06-04

Actions
Follows openQA Project - action #35914: Changes to Job::duplicateResolvedcoolo2018-05-04

Actions
Actions #1

Updated by EDiGiacinto about 6 years ago

My 2c With regards to replacing DB, and doing it in memory - if AMQP is not a way to go (so, that means also dispatching jobs over ws would be replaced) - i would explore the possibility to switch to a SAT solving mechanism instead, avoiding to hard-code condition ourselves in the future. As i see it, we can re-formulate our problem as conditions that can be nicely expressed in CNF.

Actions #2

Updated by coolo about 6 years ago

Don't! The problem is way too simple for such a monster solution

Actions #3

Updated by dasantiago about 6 years ago

  • Related to action #20812: Jobs will be assigned to workers with wrong arch unless WORKER_CLASS is set somewhere added
Actions #4

Updated by EDiGiacinto about 6 years ago

  • Related to action #25970: Profile/Optimize _workers_checker in WebSockets server added
Actions #5

Updated by EDiGiacinto about 6 years ago

  • Related to action #28714: [tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself. added
Actions #6

Updated by EDiGiacinto about 6 years ago

coolo wrote:

Don't! The problem is way too simple for such a monster solution

Well, it seems growing in complexity now, so maybe a simple solution is not enough anymore - and it might actually help slim the logic, as we could infer CNFs from job settings.

Not saying that is the road to hit - just worth mentioning the possibilities.

Actions #7

Updated by EDiGiacinto about 6 years ago

  • Related to action #31069: Job life cycle not always covered by events added
Actions #9

Updated by EDiGiacinto about 6 years ago

  • Related to action #25124: [tools][sprint 201709.1] Workers disconnects from websocket server and getting stuck: job shows as 'State: assigned' forever added
Actions #10

Updated by EDiGiacinto almost 6 years ago

  • Related to action #35296: Error messages on worker about "Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 359, <GEN298662> line 4." added
Actions #11

Updated by szarate almost 6 years ago

  • Due date set to 2018-05-05

due to changes in a related task

Actions #12

Updated by szarate almost 6 years ago

Actions #13

Updated by EDiGiacinto almost 6 years ago

  • Related to action #36727: job_grab does not cope with parallel cycles added
Actions #14

Updated by okurz almost 5 years ago

  • Category changed from 122 to Feature requests
Actions #15

Updated by okurz about 4 years ago

  • Status changed from New to Resolved
  • Assignee set to okurz

The one open subtask #12876 is still a valid feature request but I doubt we need this top-level tracker as it does not provide more details.

Actions #16

Updated by szarate over 3 years ago

  • Tracker changed from action to coordination
Actions

Also available in: Atom PDF