action #124469
closedAllow partial product retrigger size:M
0%
Description
Motivation¶
Fixing job failures sometimes requires editing medium and testsuite settings. It'd be useful to have a job restart option that'll behave like partial isos post
but only for the target job and its descendants, without restarting any parent jobs or parallel job dependency branches. The restarted jobs would be created from scratch using the original isos post
settings and the current testsuite/medium/job group configuration. Unlike normal restart, job settings of the original failed/cancelled jobs would be ignored.
Acceptance criteria¶
- AC1: It is clear how the partial product re-trigger is supposed to work (how the "part" is specified)
- AC2: A solution exists to re-trigger a subset of tests re-evaluating scheduling settings (and not just re-triggering with the same settings)
Suggestions¶
- Follow comments in the ticket
Updated by okurz over 1 year ago
- Tags set to reactive work
- Category changed from Feature requests to Support
- Target version set to Ready
maybe something like this is already possible? Hence adding ticket to the backlog as "Support" to find out.
Updated by mkittler over 1 year ago
It would have also been great to give a concrete example. At least I am not quite sure how this is supposed to work in detail.
Updated by MDoucha over 1 year ago
Example:
Here we have a scheduled product which generated 238 jobs across multiple job groups: https://openqa.suse.de/admin/productlog?id=1740102
And from that scheduled product, I want to retrigger (automatically reload any testsuite/medium/job group changes) just this job and its 13 children: https://openqa.suse.de/tests/10596043
Updated by mkittler over 1 year ago
- Description updated (diff)
- Status changed from New to Feedback
- Assignee set to mkittler
maybe something like this is already possible?
So it would be something like:
- You get all settings from the original scheduled product.
- You compile the list of jobs you care about somehow as a
TEST
setting, e.g.TEST=parent_job1,child_job1,child_job2
. - You make a new
isos post
call with settings from 1. and 2..
Of course compiling the list of jobs you care about is the tricky part. For instance in your example, this list of children might even change when you amend testsuite settings. So it really needed to be re-computed and a static list like in 2. would not work. So I guess it is not possible right now. Besides, doing step 2. manually is very tedious (even if the list of children won't change).
This leads to the question: How do you want to select the relevant subset of jobs?
Updated by mkittler over 1 year ago
I keep it as support ticket. If AC1 is clarified, we should estimate it.
Updated by mkittler over 1 year ago
How do you want to select the relevant subset of jobs?
I suppose you could specify the target jobs via TEST=…
as you already can right now (the variable supports a comma-separated list). To take children of those jobs into account as well you'd also specify _INCLUDE_CHILDREN=1
which would be a new setting. It would include all kinds of children (chained, parallel, directly chained) and recursively cover all children. Is that good enough or do you need to select by dependency type and limit the depth?
I'm not sure how hard/easy that would be to implement right now and I'm also wondering how hard/easy it would be after https://github.com/os-autoinst/openQA/pull/4999 has been merged.
An additional usability issue would be that we currently only allow to re-trigger a scheduled product with the same settings as before. So you'd have to manually craft a new isos post
command. Is that acceptable?
Updated by MDoucha over 1 year ago
mkittler wrote:
This leads to the question: How do you want to select the relevant subset of jobs?
Just like in the example I gave above, I'll typically want to retrigger a specific job and its dependency subtree. So I want to select by single job ID. Ideally, the retrigger option would also be included in the web UI in the "restart job" drop-down menu.
I don't care whether the full retriggered job list will be the original one or the new one from job group config. But any individual job settings changes in the job group config must be applied to the corresponding retriggered jobs.
Updated by mkittler over 1 year ago
So I want to select by single job ID.
Ok, by job ID. That means the code which is scheduling a product needed to map newly generated jobs to an existing job ID. Wouldn't it be more reliable to just specify the TEST
name?
Ideally, the retrigger option would also be included in the web UI in the "restart job" drop-down menu.
I suppose that would be nice. It would be "Re-trigger scheduled product from here including child jobs". Since this is not a "restart" in the usual sense it may deserve a distinct icon/button - especially also since this way of re-triggering is also allowed if the job has already been restarted in the usual sense.
I don't care whether the full retriggered job list will be the original one or the new one from job group config.
Then I'd make it the new one because that will be way easier to implement. (After all we'd just run a filtered version of the code for scheduling an ISO again.)
But any individual job settings changes in the job group config must be applied to the corresponding retriggered jobs.
Yes. I suppose that's the point of the whole feature. By the way, what do we do if the amended settings would not lead to this job being created anymore at all? I suppose then you're just supposed to get a message stating that.
I'm not sure how hard/easy that would be to implement right now …
Judging by the code we currently pull parents into the set of jobs to be triggered. So this feature would "just" be the reverse. I'll have to experiment myself to see whether this is actually how it behaves (because I might misunderstand the code). If it was true, then it might be easy to implement. It would also mean that https://github.com/os-autoinst/openQA/pull/4999 is not in the way (as the current implementation already de-duplicates this kind of dependency handling).
Updated by MDoucha over 1 year ago
mkittler wrote:
Ok, by job ID. That means the code which is scheduling a product needed to map newly generated jobs to an existing job ID. Wouldn't it be more reliable to just specify the
TEST
name?
It'd be less convenient for the user to not allow job ID. If you need to match internally by TEST
name, you can simply load it from the existing job.
Yes. I suppose that's the point of the whole feature. By the way, what do we do if the amended settings would not lead to this job being created anymore at all? I suppose then you're just supposed to get a message stating that.
If the whole dependency subtree including the target job got removed, then there's nothing to retrigger if you strictly follow the new job group config. You could still opttionally force-retrigger the target job itself with empty settings from the job group (so only testsuite and medium setting will apply).
Updated by mkittler over 1 year ago
Ok. Since this touches the same code as https://github.com/os-autoinst/openQA/pull/4999 (merged) I'd like to have this PR merged first. I'll give it a try when I have time although of course other issues have priority.
Updated by mkittler over 1 year ago
I'll typically want to retrigger a specific job and its dependency subtree.
Looks like there's the existing scheduling variable _SKIP_CHAINED_DEPS
that does exactly that. Except that it only affects chained and directly chained dependencies. You'd still get parallel parents. However, that's supposedly even what you actually want (because only re-triggering the parallel child without its parent is likely useless).
However, besides adding UI for using _SKIP_CHAINED_DEPS
more conveniently, there's still something missing. When we would specify the "starting point" via the TEST
variable we'd actually only get that job and no children. So we need a mechanism to pull in children automatically in the same way we currently pull in parents automatically (unless _SKIP_CHAINED_DEPS
is specified).
Updated by livdywan over 1 year ago
- Subject changed from Allow partial product retrigger to Allow partial product retrigger size:M
- Description updated (diff)
- Category changed from Support to Feature requests
Updated by mkittler over 1 year ago
- Status changed from Workable to Feedback
A draft with the most simple change/test I can think of to implement the "missing bit" mentioned in the last paragraph of my previous comment: https://github.com/os-autoinst/openQA/pull/5096
Updated by mkittler over 1 year ago
- Status changed from Workable to Feedback
I've been updating https://github.com/os-autoinst/openQA/pull/5096. I'm waiting for feedback before continuing.
Updated by livdywan over 1 year ago
mkittler wrote:
I've been updating https://github.com/os-autoinst/openQA/pull/5096. I'm waiting for feedback before continuing.
Merged!
Updated by mkittler over 1 year ago
Yes, so what's left is the UI.
I suppose I'll add a context menu to the dependency tree with the menu item "Re-schedule product from here". Addition an additional button on every node in the tree would likely take too much space and clutter the graph too much. Maybe a changed mouse cursor could indicate the presence of the context menu.
I'm not sure whether restarting a scheduled product with additional/updated settings has been implemented yet so possibly that needed to be implemented as well.
Updated by MDoucha over 1 year ago
mkittler wrote:
Yes, so what's left is the UI.
I suppose I'll add a context menu to the dependency tree with the menu item "Re-schedule product from here". Addition an additional button on every node in the tree would likely take too much space and clutter the graph too much. Maybe a changed mouse cursor could indicate the presence of the context menu.
I think the "restart job" drop down menu in the job result box would be a better place than the dependency graph.
Updated by mkittler over 1 year ago
Ok, and I suppose this would also be easier to implement anyways. I've just though of the dependency graph because then it doesn't matter on which job one currently is but I suppose one can simply navigate to the "start" job and restart from there.
Updated by mkittler over 1 year ago
PR: https://github.com/os-autoinst/openQA/pull/5233
I ended up adding another button in the "Scheduled product: …" line. Adding this in the drop down menu of the restart button would be problematic because we hide the restart button if restarting is not possible but it wouldn't make sense to disallow the partial retrigger in all such cases.
Updated by livdywan about 1 year ago
mkittler wrote:
PR: https://github.com/os-autoinst/openQA/pull/5233
I ended up adding another button in the "Scheduled product: …" line. Adding this in the drop down menu of the restart button would be problematic because we hide the restart button if restarting is not possible but it wouldn't make sense to disallow the partial retrigger in all such cases.
Merged.
Updated by tinita about 1 year ago
https://build.opensuse.org/package/live_build_log/devel:openQA/openQA/openSUSE_Leap_15.4/x86_64
unterminated quoted string literal at /usr/lib/perl5/vendor_perl/5.26.1/Mojolicious/Plugin/AssetPack/Pipe/JavaScript.pm line 22.
This seems to be a bug fixed in Javascript::Minifier::XS as it happens with version 0.14, but not 0.15
Changes here https://metacpan.org/dist/JavaScript-Minifier-XS seem unrelated though.
Updated by tinita about 1 year ago
https://github.com/os-autoinst/openQA/pull/5264 Use regular single quotes in javascript code (merged)
Updated by mkittler about 1 year ago
Thanks for taking care. I guess now we can wait a little bit longer for user feedback.
Updated by okurz about 1 year ago
- Status changed from Feedback to Resolved
so no negative feedback received so far, resolving.