Project

General

Profile

action #169747

Updated by livdywan about 1 month ago

## Observation 

 In #166772 I noticed that multiple minion jobs are created for the same openQA job. 

 The jobs I investigated were all incomplete, and I didn't research if this is happening also for passed/failed jobs. 

 Here is an example: 

 hook_script: 
 * https://openqa.opensuse.org/minion/jobs?id=4545183 
 * https://openqa.opensuse.org/minion/jobs?id=4545190 
 * https://openqa.opensuse.org/minion/jobs?id=4545193 

 finalize_job_results: 
 * https://openqa.opensuse.org/minion/jobs?id=4545182 
 * https://openqa.opensuse.org/minion/jobs?id=4545187 
 * https://openqa.opensuse.org/minion/jobs?id=4545191 

 https://openqa.opensuse.org/tests/4637440 

 `Reason: abandoned: associated worker qa-power8-3:4 re-connected but abandoned the job` 

 Also check 
 ``` 
 select id, concat('https://openqa.opensuse.org/tests/', args->1), task, started, state from minion_jobs where task = 'hook_script' and created >= '2024-11-10 11:39:00' and created <= '2024-11-12 11:42:00' and notes::varchar like '%hook_rc": 1%' order by started limit 100; 
 ``` 

 Especially having multiple hook_script jobs for the same job could be problematic. 

 `enqueue_finalize_job_results` is called from `Jobs->done` and `Jobs->cancel`. 

 ## Acceptance Criteria 
 **AC1:** At least hook_script minion jobs are not created multiple times on the same openQA job (maybe also finalize_job_results) 

 ## Suggestions 
 * Use database queries to find relevant duplicate Minion jobs and the reason why their openQA jobs incompleted (maybe group by `args`) to find out 
   * If it happens only on incompletes or on all kinds of results 
   * If it happens only on those "reconnect" incompletes 
   * Then it might be easier to find out which code is calling `done` multiple times and why 
 * Ensure the `done`/`cancel` functions are only invoking the finalize job if the job hasn't been finalized yet 
 * Otherwise, make sure that from the finalize job hook scripts only run once 
 * Consider adding a check within the hook script itself so it doesn't matter if it is invoked multiple times

Back