Project

General

Profile

Actions

coordination #110458

open

coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

[epic] Improve `RETRY=…`-behavior for jobs with dependencies

Added by mkittler over 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2022-04-29
Due date:
% Done:

0%

Estimated time:

Description

Observation

Jobs with RETRY=… setting are automatically restarted in case of a failure but apparently the dependency handling is not done in accordance with the normal restart behavior (you would get when e.g. clicking on the restart button in the web UI).

For instance, here the root job has been restarted multiple times but none of the children have been restarted: https://openqa.suse.de/tests/8656146#dependencies

This also leads to a not so nice graph where the multiple clones of the root job are present at the same time:

Acceptance criteria

  • AC1: Jobs are restarted in a more sensible way¹ regarding dependencies. Likely there's not one best way but the default should at least work better in most cases.
  • AC2: Potential concurrency issues which might be the culprit (or at least contribute to the overall problem) here are investigated and dealt with if needed (see #110458#note-4 for further details).

¹ What "more sensible" means exactly we have still have to define for each dependency type. Maybe it makes most sense to go with the behavior the restart API has by default.

Further ideas

  • Allow the user to specify the retry behavior, similar to how it is already possible with the different parameters the restart API supports.

Files


Related issues 1 (1 open0 closed)

Related to openQA Project (public) - action #112256: Some children of parent job not cancelled (or later, restarted) when parent `parallel_failed` due to another child's parallel job failingNew2022-06-09

Actions
Actions

Also available in: Atom PDF