coordination #110458
Updated by mkittler over 2 years ago
## Observation Jobs with [`RETRY=…` setting](https://open.qa/docs/#_automatic_retries_of_jobs) are automatically restarted in case of a failure but apparently the dependency handling is not done in accordance with the normal restart behavior (you would get when e.g. clicking on the restart button in the web UI). For instance, here the root job has been restarted multiple times but none of the children have been restarted: https://openqa.suse.de/tests/8656146#dependencies This also leads to a not so nice graph where the multiple clones of the root job are present at the same time: ![](screenshot_20220429_100624.png) ## Acceptance criteria * **AC1**: Jobs are restarted in a more sensible way¹ regarding dependencies. Likely there's not one best way but the default should at least work better in most cases. * **AC2**: Potential concurrency issues which might be the culprit (or at least contribute to the overall problem) here are investigated and dealt with if needed (see #110458#note-4 for further details). needed. --- ¹ What "more sensible" means exactly we have still have to define for each dependency type. Maybe it makes most sense to go with [the behavior the restart API](https://open.qa/docs/#_handling_of_related_jobs_on_failure_cancellation_restart) has by default. ## Further ideas * Allow the user to specify the retry behavior, similar to how it is already possible with the different parameters the restart API supports.