Project

General

Profile

Actions

action #39629

closed

openQA Scheduler refactor fallout

Added by szarate over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2018-08-13
Due date:
% Done:

0%

Estimated time:

Description

This is going to be a general ticket to track problems with the new scheduler with support for blocked_by deployed during last week

Currently known problems are mostly related to jobs that are ran, when the parent is still not even started


Related issues 3 (0 open3 closed)

Related to openQA Project (public) - action #32725: [tools] Scheduler job_grab/filter_jobs refactoringResolved2018-05-05

Actions
Related to openQA Project (public) - action #39560: Tests for blocked_by and loops inside of itResolved2018-08-10

Actions
Related to openQA Project (public) - action #39068: Webui killed by out of memory in o3 (triggered by postgresql)Rejected2018-08-01

Actions
Actions #1

Updated by szarate over 6 years ago

  • Description updated (diff)
Actions #2

Updated by szarate over 6 years ago

  • Description updated (diff)
Actions #3

Updated by EDiGiacinto over 6 years ago

  • Related to action #32725: [tools] Scheduler job_grab/filter_jobs refactoring added
Actions #4

Updated by szarate over 6 years ago

  • Related to action #39560: Tests for blocked_by and loops inside of it added
Actions #5

Updated by szarate over 6 years ago

As a result after having a full build, and seeing jobs, that were missing certain parts:

And many other, with a beta on top, it was decided to revert the changes (at obs level) and deploy them in OSD for the time being. While we look at the blocked_by whole changes a bit better

https://progress.opensuse.org/issues/39560#note-4

Actions #6

Updated by EDiGiacinto over 6 years ago

Also, stuck in assigned (still in that condition):

Actions #7

Updated by szarate over 6 years ago

  • Related to action #39068: Webui killed by out of memory in o3 (triggered by postgresql) added
Actions #8

Updated by coolo over 6 years ago

  • Status changed from New to Resolved

We found in the second round several bugs that were fixed and are now 'good enough' in production. We have 2 more issues to be fixed in future sprints though:

  • Usability of how cluster scheduling are to debugged by reviewers (#40772)
  • Starvation of multimachine jobs (#48011)

https://progress.opensuse.org/issues/40904 needs to be fixed in the spec file

Actions #9

Updated by coolo over 6 years ago

  • Target version changed from Current Sprint to Done
Actions

Also available in: Atom PDF