Project

General

Profile

coordination #110458

Updated by mkittler over 1 year ago

## Observation 
 Jobs with [`RETRY=…` setting](https://open.qa/docs/#_automatic_retries_of_jobs) are automatically restarted in case of a failure but apparently the dependency handling is not done in accordance with the normal restart behavior (you would get when e.g. clicking on the restart button in the web UI). 

 For instance, here the root job has been restarted multiple times but none of the children have been restarted: https://openqa.suse.de/tests/8656146#dependencies 

 This also leads to a not so nice graph where the multiple clones of the root job are present at the same time: 
 ![](screenshot_20220429_100624.png) 

 ## Acceptance criteria 
 * **AC1**: Jobs are restarted in a more sensible way¹ regarding dependencies. Likely there's not one best way but the default should at least work better in most cases. 
 * **AC2**: Potential concurrency issues which might be the culprit (or at least contribute to the overall problem) here are investigated and dealt with if needed. 

 --- 

 ¹ What "more sensible" means exactly we have still have to define for each dependency type. Maybe it makes most sense to go with [the behavior the restart API](https://open.qa/docs/#_handling_of_related_jobs_on_failure_cancellation_restart) has by default. 

 ## Further ideas 
 * Allow the user to specify the retry behavior, similar to how it is already possible with the different parameters the restart API supports.

Back