action #169510
openImprove non-transactional creation of Minion jobs for Git updates when restarting jobs
0%
Description
Observation¶
We invoke OpenQA::App->singleton->gru->enqueue_git_clones(\%clones, \@clone_ids) if keys %clones;
outside of transactions when restarting jobs. This is problematic because for a moment it will simply look like the openQA jobs are not blocked by any Minion jobs so the scheduler might schedule them before the Git update is done.
See #169342#note-16 and notes referenced from there for further context. The short summary is that this is affecting restarted/cloned jobs in production and should therefore be fixed.
Note that after https://github.com/os-autoinst/openQA/pull/6049 has been merged the impact is really only that jobs are assigned before the related Git updates are done. There shouldn't be any bad consequence for parallel jobs anymore anyway.
Acceptance criteria¶
- AC1: The scheduler does not assign restarted jobs prematurely to workers when those jobs are actually still waiting on pending Minion jobs.
Suggestions¶
- The simplest solution would be to make the enqueuing part of the transactions in which we create the new jobs. This has already been implemented (see https://github.com/os-autoinst/openQA/pull/6048) the solution is not ideal, see comments on that PR.
- Introduce a new initial job state that comes before "scheduled" (the current initial job state), e.g. "preparing" or simply "new". It would be ignored by the scheduler (which only looks for "scheduled"). So if we only transition to "scheduled" after the creation of the Minion jobs that would work. Of course we need to consider error cases so we never leave jobs in that new initial state forever. This would also require adjustments (or at least double-checking) in all code that deals with job states.