Project

General

Profile

Actions

action #39560

closed

Tests for blocked_by and loops inside of it

Added by szarate over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
Start date:
2018-08-10
Due date:
% Done:

0%

Estimated time:

Description

Currently we have the following PR: https://github.com/os-autoinst/openQA/pull/1743 that is hotpached already onto osd.

We need to pick it up where coolo left it, and add tests for the situation aswell, note that this might also require to revisit the changes done in https://github.com/os-autoinst/openQA/pull/1718


Related issues 2 (0 open2 closed)

Related to openQA Project - action #32725: [tools] Scheduler job_grab/filter_jobs refactoringResolved2018-05-05

Actions
Related to openQA Project - action #39629: openQA Scheduler refactor falloutResolved2018-08-13

Actions
Actions #1

Updated by szarate over 5 years ago

  • Description updated (diff)
Actions #2

Updated by EDiGiacinto over 5 years ago

Also https://github.com/os-autoinst/openQA/pull/1717 is related to it.

From IRC:

<mudler>that PR is hotpatched in osd already right? https://openqa.suse.de/tests/1917358#settings -> not sure, but looks like it's not catching more-than-one chained parents still 
<mudler>as one of the parent is uploading now, but it went running and missed asset
<mudler>(they were both running in parallel, now it failed the parent, for other reasons)
<coolo>mudler: what's also possible is that the blocked_by wasn't even calculated - depending how the job was created
<mudler>it's also somehow missing the parents.. i mean they should be two, no?
<foursixnine>coolo: I think blocked_by is depending on using isos post/job duplicate right?
<mudler>but if i read correctly, https://openqa.suse.de/tests/1913372#settings is the one that was posted
<mudler>and misses parents as well
<coolo>mudler: if it's missing parents, that's a completely different part then
<mudler>my point :) but i guess there are two bugs then, because even if the parent was in the DB it didn't waited for it
<coolo>I don't think there is a parent - so no waiting
Actions #3

Updated by EDiGiacinto over 5 years ago

  • Related to action #32725: [tools] Scheduler job_grab/filter_jobs refactoring added
Actions #4

Updated by EDiGiacinto over 5 years ago

For sake of reference, even with that PR, all seems pretty broken still:

1) Child jobs are not waiting anymore for parents to go in certain cases - still to bisect, but i believe this is a showoff from two different bugs in two different code parts e.g. https://openqa.suse.de/tests/1917358

2) Stale jobs in running state forever - In few days of having this in production, we are having back bugs that were actually addressed in the previous scheduler logics, that had to cope with production loads ( maybe we did simplified maybe too much here ? ) see for e.g. https://openqa.suse.de/tests/1925778 but we had plenty of them with ' State: running finished 3 days ago ( 00:03 minutes ) ' or similar.

3) Having a separated way to represent the cluster is a bit confusing now - we have settings page that show something that is not coherent what the scheduler is actually considering, and this makes things to debug even more messy, because you see something, scheduler does another, so from my point this is a big -1 as makes things a lot counterintuitive (just my opinion).

4) Tests are kinda bended to make them go successfully towards new scheduler logic, which makes me wonder if this is working as it is expected, as the core logic is not covered by unit tests.

For me, IMHO this is kinda a no-go at this point.

Actions #5

Updated by szarate over 5 years ago

  • Related to action #39629: openQA Scheduler refactor fallout added
Actions #6

Updated by coolo over 5 years ago

  • Status changed from New to Resolved

I added 2 more test cases for blocked_by - and it looks good in production, so let's resolve it

Actions #7

Updated by coolo over 5 years ago

  • Target version changed from Current Sprint to Done
Actions

Also available in: Atom PDF