Project

General

Profile

Actions

action #169747

closed

coordination #102915: [saga][epic] Automated classification of failures

coordination #166655: [epic] openqa-label-known-issues

Multiple finalize_job_results and hook_script minion jobs per openQA job size:M

Added by tinita about 1 month ago. Updated 22 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-09-13
Due date:
% Done:

0%

Estimated time:

Description

Observation

In #166772 I noticed that multiple minion jobs are created for the same openQA job.

The jobs I investigated were all incomplete, and I didn't research if this is happening also for passed/failed jobs.

Here is an example:

hook_script:

finalize_job_results:

https://openqa.opensuse.org/tests/4637440

Reason: abandoned: associated worker qa-power8-3:4 re-connected but abandoned the job

Also check

select id, concat('https://openqa.opensuse.org/tests/', args->1), task, started, state from minion_jobs where task = 'hook_script' and created >= '2024-11-10 11:39:00' and created <= '2024-11-12 11:42:00' and notes::varchar like '%hook_rc": 1%' order by started limit 100;

Especially having multiple hook_script jobs for the same job could be problematic.

enqueue_finalize_job_results is called from Jobs->done and Jobs->cancel.

Acceptance Criteria

AC1: At least hook_script minion jobs are not created multiple times on the same openQA job (maybe also finalize_job_results)

Suggestions

  • Use database queries to find relevant duplicate Minion jobs and the reason why their openQA jobs incompleted (maybe group by args) to find out
    • If it happens only on incompletes or on all kinds of results
    • If it happens only on those "reconnect" incompletes
    • Then it might be easier to find out which code is calling done multiple times and why
  • Ensure the done/cancel functions are only invoking the finalize job if the job hasn't been finalized yet
  • Otherwise, make sure that from the finalize job hook scripts only run once
  • Consider adding a check within the hook script itself so it doesn't matter if it is invoked multiple times

Related issues 1 (0 open1 closed)

Copied from openQA Project (public) - action #166772: openqa-label-known-issues overrides size:SResolvedtinita2024-09-13

Actions
Actions

Also available in: Atom PDF