Project

General

Profile

action #124469

Allow partial product retrigger size:M

Added by MDoucha 4 months ago. Updated 5 days ago.

Status:
Feedback
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2023-02-14
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

Fixing job failures sometimes requires editing medium and testsuite settings. It'd be useful to have a job restart option that'll behave like partial isos post but only for the target job and its descendants, without restarting any parent jobs or parallel job dependency branches. The restarted jobs would be created from scratch using the original isos post settings and the current testsuite/medium/job group configuration. Unlike normal restart, job settings of the original failed/cancelled jobs would be ignored.

Acceptance criteria

  • AC1: It is clear how the partial product re-trigger is supposed to work (how the "part" is specified)
  • AC2: A solution exists to re-trigger a subset of tests re-evaluating scheduling settings (and not just re-triggering with the same settings)

Suggestions

  • Follow comments in the ticket

History

#1 Updated by okurz 4 months ago

  • Tags set to reactive work
  • Category changed from Feature requests to Support
  • Target version set to Ready

maybe something like this is already possible? Hence adding ticket to the backlog as "Support" to find out.

#2 Updated by okurz 4 months ago

  • Priority changed from Normal to Low

#3 Updated by mkittler 3 months ago

It would have also been great to give a concrete example. At least I am not quite sure how this is supposed to work in detail.

#4 Updated by MDoucha 3 months ago

Example:
Here we have a scheduled product which generated 238 jobs across multiple job groups: https://openqa.suse.de/admin/productlog?id=1740102
And from that scheduled product, I want to retrigger (automatically reload any testsuite/medium/job group changes) just this job and its 13 children: https://openqa.suse.de/tests/10596043

#5 Updated by mkittler 3 months ago

  • Description updated (diff)
  • Status changed from New to Feedback
  • Assignee set to mkittler

maybe something like this is already possible?

So it would be something like:

  1. You get all settings from the original scheduled product.
  2. You compile the list of jobs you care about somehow as a TEST setting, e.g. TEST=parent_job1,child_job1,child_job2.
  3. You make a new isos post call with settings from 1. and 2..

Of course compiling the list of jobs you care about is the tricky part. For instance in your example, this list of children might even change when you amend testsuite settings. So it really needed to be re-computed and a static list like in 2. would not work. So I guess it is not possible right now. Besides, doing step 2. manually is very tedious (even if the list of children won't change).

This leads to the question: How do you want to select the relevant subset of jobs?

#6 Updated by mkittler 3 months ago

  • Description updated (diff)

#7 Updated by mkittler 3 months ago

I keep it as support ticket. If AC1 is clarified, we should estimate it.

#8 Updated by mkittler 3 months ago

How do you want to select the relevant subset of jobs?

I suppose you could specify the target jobs via TEST=… as you already can right now (the variable supports a comma-separated list). To take children of those jobs into account as well you'd also specify _INCLUDE_CHILDREN=1 which would be a new setting. It would include all kinds of children (chained, parallel, directly chained) and recursively cover all children. Is that good enough or do you need to select by dependency type and limit the depth?

I'm not sure how hard/easy that would be to implement right now and I'm also wondering how hard/easy it would be after https://github.com/os-autoinst/openQA/pull/4999 has been merged.

An additional usability issue would be that we currently only allow to re-trigger a scheduled product with the same settings as before. So you'd have to manually craft a new isos post command. Is that acceptable?

#9 Updated by MDoucha 3 months ago

mkittler wrote:

This leads to the question: How do you want to select the relevant subset of jobs?

Just like in the example I gave above, I'll typically want to retrigger a specific job and its dependency subtree. So I want to select by single job ID. Ideally, the retrigger option would also be included in the web UI in the "restart job" drop-down menu.

I don't care whether the full retriggered job list will be the original one or the new one from job group config. But any individual job settings changes in the job group config must be applied to the corresponding retriggered jobs.

#10 Updated by mkittler 3 months ago

So I want to select by single job ID.

Ok, by job ID. That means the code which is scheduling a product needed to map newly generated jobs to an existing job ID. Wouldn't it be more reliable to just specify the TEST name?

Ideally, the retrigger option would also be included in the web UI in the "restart job" drop-down menu.

I suppose that would be nice. It would be "Re-trigger scheduled product from here including child jobs". Since this is not a "restart" in the usual sense it may deserve a distinct icon/button - especially also since this way of re-triggering is also allowed if the job has already been restarted in the usual sense.

I don't care whether the full retriggered job list will be the original one or the new one from job group config.

Then I'd make it the new one because that will be way easier to implement. (After all we'd just run a filtered version of the code for scheduling an ISO again.)

But any individual job settings changes in the job group config must be applied to the corresponding retriggered jobs.

Yes. I suppose that's the point of the whole feature. By the way, what do we do if the amended settings would not lead to this job being created anymore at all? I suppose then you're just supposed to get a message stating that.


I'm not sure how hard/easy that would be to implement right now …

Judging by the code we currently pull parents into the set of jobs to be triggered. So this feature would "just" be the reverse. I'll have to experiment myself to see whether this is actually how it behaves (because I might misunderstand the code). If it was true, then it might be easy to implement. It would also mean that https://github.com/os-autoinst/openQA/pull/4999 is not in the way (as the current implementation already de-duplicates this kind of dependency handling).

#11 Updated by MDoucha 3 months ago

mkittler wrote:

Ok, by job ID. That means the code which is scheduling a product needed to map newly generated jobs to an existing job ID. Wouldn't it be more reliable to just specify the TEST name?

It'd be less convenient for the user to not allow job ID. If you need to match internally by TEST name, you can simply load it from the existing job.

Yes. I suppose that's the point of the whole feature. By the way, what do we do if the amended settings would not lead to this job being created anymore at all? I suppose then you're just supposed to get a message stating that.

If the whole dependency subtree including the target job got removed, then there's nothing to retrigger if you strictly follow the new job group config. You could still opttionally force-retrigger the target job itself with empty settings from the job group (so only testsuite and medium setting will apply).

#12 Updated by mkittler 3 months ago

Ok. Since this touches the same code as https://github.com/os-autoinst/openQA/pull/4999 (merged) I'd like to have this PR merged first. I'll give it a try when I have time although of course other issues have priority.

#13 Updated by mkittler 3 months ago

I'll typically want to retrigger a specific job and its dependency subtree.

Looks like there's the existing scheduling variable _SKIP_CHAINED_DEPS that does exactly that. Except that it only affects chained and directly chained dependencies. You'd still get parallel parents. However, that's supposedly even what you actually want (because only re-triggering the parallel child without its parent is likely useless).

However, besides adding UI for using _SKIP_CHAINED_DEPS more conveniently, there's still something missing. When we would specify the "starting point" via the TEST variable we'd actually only get that job and no children. So we need a mechanism to pull in children automatically in the same way we currently pull in parents automatically (unless _SKIP_CHAINED_DEPS is specified).

#14 Updated by cdywan 2 months ago

  • Subject changed from Allow partial product retrigger to Allow partial product retrigger size:M
  • Description updated (diff)
  • Category changed from Support to Feature requests

#15 Updated by okurz about 2 months ago

  • Status changed from Feedback to Workable

#16 Updated by mkittler about 1 month ago

  • Status changed from Workable to Feedback

A draft with the most simple change/test I can think of to implement the "missing bit" mentioned in the last paragraph of my previous comment: https://github.com/os-autoinst/openQA/pull/5096

#17 Updated by okurz about 1 month ago

  • Status changed from Feedback to Workable

#18 Updated by mkittler 5 days ago

  • Status changed from Workable to Feedback

I've been updating https://github.com/os-autoinst/openQA/pull/5096. I'm waiting for feedback before continuing.

Also available in: Atom PDF