action #124469
Allow partial product retrigger size:M
0%
Description
Motivation¶
Fixing job failures sometimes requires editing medium and testsuite settings. It'd be useful to have a job restart option that'll behave like partial isos post
but only for the target job and its descendants, without restarting any parent jobs or parallel job dependency branches. The restarted jobs would be created from scratch using the original isos post
settings and the current testsuite/medium/job group configuration. Unlike normal restart, job settings of the original failed/cancelled jobs would be ignored.
Acceptance criteria¶
- AC1: It is clear how the partial product re-trigger is supposed to work (how the "part" is specified)
- AC2: A solution exists to re-trigger a subset of tests re-evaluating scheduling settings (and not just re-triggering with the same settings)
Suggestions¶
- Follow comments in the ticket
History
#4
Updated by MDoucha 3 months ago
Example:
Here we have a scheduled product which generated 238 jobs across multiple job groups: https://openqa.suse.de/admin/productlog?id=1740102
And from that scheduled product, I want to retrigger (automatically reload any testsuite/medium/job group changes) just this job and its 13 children: https://openqa.suse.de/tests/10596043
#5
Updated by mkittler 3 months ago
- Description updated (diff)
- Status changed from New to Feedback
- Assignee set to mkittler
maybe something like this is already possible?
So it would be something like:
- You get all settings from the original scheduled product.
- You compile the list of jobs you care about somehow as a
TEST
setting, e.g.TEST=parent_job1,child_job1,child_job2
. - You make a new
isos post
call with settings from 1. and 2..
Of course compiling the list of jobs you care about is the tricky part. For instance in your example, this list of children might even change when you amend testsuite settings. So it really needed to be re-computed and a static list like in 2. would not work. So I guess it is not possible right now. Besides, doing step 2. manually is very tedious (even if the list of children won't change).
This leads to the question: How do you want to select the relevant subset of jobs?
#8
Updated by mkittler 3 months ago
How do you want to select the relevant subset of jobs?
I suppose you could specify the target jobs via TEST=…
as you already can right now (the variable supports a comma-separated list). To take children of those jobs into account as well you'd also specify _INCLUDE_CHILDREN=1
which would be a new setting. It would include all kinds of children (chained, parallel, directly chained) and recursively cover all children. Is that good enough or do you need to select by dependency type and limit the depth?
I'm not sure how hard/easy that would be to implement right now and I'm also wondering how hard/easy it would be after https://github.com/os-autoinst/openQA/pull/4999 has been merged.
An additional usability issue would be that we currently only allow to re-trigger a scheduled product with the same settings as before. So you'd have to manually craft a new isos post
command. Is that acceptable?
#9
Updated by MDoucha 3 months ago
mkittler wrote:
This leads to the question: How do you want to select the relevant subset of jobs?
Just like in the example I gave above, I'll typically want to retrigger a specific job and its dependency subtree. So I want to select by single job ID. Ideally, the retrigger option would also be included in the web UI in the "restart job" drop-down menu.
I don't care whether the full retriggered job list will be the original one or the new one from job group config. But any individual job settings changes in the job group config must be applied to the corresponding retriggered jobs.
#10
Updated by mkittler 3 months ago
So I want to select by single job ID.
Ok, by job ID. That means the code which is scheduling a product needed to map newly generated jobs to an existing job ID. Wouldn't it be more reliable to just specify the TEST
name?
Ideally, the retrigger option would also be included in the web UI in the "restart job" drop-down menu.
I suppose that would be nice. It would be "Re-trigger scheduled product from here including child jobs". Since this is not a "restart" in the usual sense it may deserve a distinct icon/button - especially also since this way of re-triggering is also allowed if the job has already been restarted in the usual sense.
I don't care whether the full retriggered job list will be the original one or the new one from job group config.
Then I'd make it the new one because that will be way easier to implement. (After all we'd just run a filtered version of the code for scheduling an ISO again.)
But any individual job settings changes in the job group config must be applied to the corresponding retriggered jobs.
Yes. I suppose that's the point of the whole feature. By the way, what do we do if the amended settings would not lead to this job being created anymore at all? I suppose then you're just supposed to get a message stating that.
I'm not sure how hard/easy that would be to implement right now …
Judging by the code we currently pull parents into the set of jobs to be triggered. So this feature would "just" be the reverse. I'll have to experiment myself to see whether this is actually how it behaves (because I might misunderstand the code). If it was true, then it might be easy to implement. It would also mean that https://github.com/os-autoinst/openQA/pull/4999 is not in the way (as the current implementation already de-duplicates this kind of dependency handling).
#11
Updated by MDoucha 3 months ago
mkittler wrote:
Ok, by job ID. That means the code which is scheduling a product needed to map newly generated jobs to an existing job ID. Wouldn't it be more reliable to just specify the
TEST
name?
It'd be less convenient for the user to not allow job ID. If you need to match internally by TEST
name, you can simply load it from the existing job.
Yes. I suppose that's the point of the whole feature. By the way, what do we do if the amended settings would not lead to this job being created anymore at all? I suppose then you're just supposed to get a message stating that.
If the whole dependency subtree including the target job got removed, then there's nothing to retrigger if you strictly follow the new job group config. You could still opttionally force-retrigger the target job itself with empty settings from the job group (so only testsuite and medium setting will apply).
#12
Updated by mkittler 3 months ago
Ok. Since this touches the same code as https://github.com/os-autoinst/openQA/pull/4999 (merged) I'd like to have this PR merged first. I'll give it a try when I have time although of course other issues have priority.
#13
Updated by mkittler 3 months ago
I'll typically want to retrigger a specific job and its dependency subtree.
Looks like there's the existing scheduling variable _SKIP_CHAINED_DEPS
that does exactly that. Except that it only affects chained and directly chained dependencies. You'd still get parallel parents. However, that's supposedly even what you actually want (because only re-triggering the parallel child without its parent is likely useless).
However, besides adding UI for using _SKIP_CHAINED_DEPS
more conveniently, there's still something missing. When we would specify the "starting point" via the TEST
variable we'd actually only get that job and no children. So we need a mechanism to pull in children automatically in the same way we currently pull in parents automatically (unless _SKIP_CHAINED_DEPS
is specified).
#15
Updated by okurz about 2 months ago
- Status changed from Feedback to Workable
#16
Updated by mkittler about 1 month ago
- Status changed from Workable to Feedback
A draft with the most simple change/test I can think of to implement the "missing bit" mentioned in the last paragraph of my previous comment: https://github.com/os-autoinst/openQA/pull/5096
#17
Updated by okurz about 1 month ago
- Status changed from Feedback to Workable
#18
Updated by mkittler 5 days ago
- Status changed from Workable to Feedback
I've been updating https://github.com/os-autoinst/openQA/pull/5096. I'm waiting for feedback before continuing.