action #169747
Updated by livdywan about 1 month ago
## Observation In #166772 I noticed that multiple minion jobs are created for the same openQA job. The jobs I investigated were all incomplete, and I didn't research if this is happening also for passed/failed jobs. Here is an example: hook_script: * https://openqa.opensuse.org/minion/jobs?id=4545183 * https://openqa.opensuse.org/minion/jobs?id=4545190 * https://openqa.opensuse.org/minion/jobs?id=4545193 finalize_job_results: * https://openqa.opensuse.org/minion/jobs?id=4545182 * https://openqa.opensuse.org/minion/jobs?id=4545187 * https://openqa.opensuse.org/minion/jobs?id=4545191 https://openqa.opensuse.org/tests/4637440 `Reason: abandoned: associated worker qa-power8-3:4 re-connected but abandoned the job` Also check ``` select id, concat('https://openqa.opensuse.org/tests/', args->1), task, started, state from minion_jobs where task = 'hook_script' and created >= '2024-11-10 11:39:00' and created <= '2024-11-12 11:42:00' and notes::varchar like '%hook_rc": 1%' order by started limit 100; ``` Especially having multiple hook_script jobs for the same job could be problematic. `enqueue_finalize_job_results` is called from `Jobs->done` and `Jobs->cancel`. ## Acceptance Criteria **AC1:** At least hook_script minion jobs are not created multiple times on the same openQA job (maybe also finalize_job_results) ## Suggestions * Use database queries to find relevant duplicate Minion jobs and the reason why their openQA jobs incompleted (maybe group by `args`) to find out * If it happens only on incompletes or on all kinds of results * If it happens only on those "reconnect" incompletes * Then it might be easier to find out which code is calling `done` multiple times and why * Ensure the `done`/`cancel` functions are only invoking the finalize job if the job hasn't been finalized yet * Otherwise, make sure that from the finalize job hook scripts only run once * Consider adding a check within the hook script itself so it doesn't matter if it is invoked multiple times