action #39629

openQA Scheduler refactor fallout

Added by szarate over 1 year ago. Updated over 1 year ago.

Status:ResolvedStart date:13/08/2018
Priority:ImmediateDue date:
Assignee:-% Done:

0%

Category:Feature requests
Target version:Done
Difficulty:
Duration:

Description

This is going to be a general ticket to track problems with the new scheduler with support for blocked_by deployed during last week

Currently known problems are mostly related to jobs that are ran, when the parent is still not even started


Related issues

Related to openQA Project - action #32725: [tools] Scheduler job_grab/filter_jobs refactoring Resolved 05/05/2018
Related to openQA Project - action #39560: Tests for blocked_by and loops inside of it Resolved 10/08/2018
Related to openQA Project - action #39068: Webui killed by out of memory in o3 (triggered by postgre... Rejected 01/08/2018

History

#1 Updated by szarate over 1 year ago

  • Description updated (diff)

#2 Updated by szarate over 1 year ago

  • Description updated (diff)

#3 Updated by EDiGiacinto over 1 year ago

  • Related to action #32725: [tools] Scheduler job_grab/filter_jobs refactoring added

#4 Updated by szarate over 1 year ago

  • Related to action #39560: Tests for blocked_by and loops inside of it added

#5 Updated by szarate over 1 year ago

As a result after having a full build, and seeing jobs, that were missing certain parts:

And many other, with a beta on top, it was decided to revert the changes (at obs level) and deploy them in OSD for the time being. While we look at the blocked_by whole changes a bit better

https://progress.opensuse.org/issues/39560#note-4

#6 Updated by EDiGiacinto over 1 year ago

Also, stuck in assigned (still in that condition):

#7 Updated by szarate over 1 year ago

  • Related to action #39068: Webui killed by out of memory in o3 (triggered by postgresql) added

#8 Updated by coolo over 1 year ago

  • Status changed from New to Resolved

We found in the second round several bugs that were fixed and are now 'good enough' in production. We have 2 more issues to be fixed in future sprints though:
- Usability of how cluster scheduling are to debugged by reviewers (#40772)
- Starvation of multimachine jobs (#48011)

https://progress.opensuse.org/issues/40904 needs to be fixed in the spec file

#9 Updated by coolo over 1 year ago

  • Target version changed from Current Sprint to Done

Also available in: Atom PDF