Project

General

Profile

action #70618

Updated by mkittler over 4 years ago

### motivation 
 Considering https://progress.opensuse.org/issues/69979#note-6 and subsequent comments people want to avoid restarting the directly chained parent as much as possible and considering that this is usually a long running job it also makes sense. 

 ### suggestions 
 This ticket is far from workable. I'm just creating it do save and share an implementation idea I've just had. 

 1. Actually, restart the directly chained parent. If the parent has already been restarted, just the clone of the restart that parent. (Yes, so far it sounds not like an improvement.) 
 2. When assigning the directly chained job cluster to a worker, prefer the previously used worker. We already track cloning history and which job has been executed on which worker so that part shouldn't be hard. 
 3. When sending the job to the worker: 
     1. Send the original job IDs from the old directly chained cluster to the worker as well. 
     2. Send a list of job IDs we would actually like to skip to the worker. That list would contain the IDs of directly chained parents. 
 4. The worker checks whether it ran no other jobs then the jobs from 3.1. If it ran other jobs it will just execute all jobs as usual. If it did not run other jobs it will skip jobs from 3.2 and effectively not run the restarted parent jobs again. 

 This way we would not change a lot in openQA and I guess we would still achieve what the users are after. We would restart the parent "just in case" we really need to re-run it and otherwise just skip the restarted job. It would even work when a worker is working for different web UIs. What do you think? Did I forget something?

Back