Project

General

Profile

Actions

action #176067

closed

coordination #58184: [saga][epic][use case] full version control awareness within openQA

coordination #152847: [epic] version control awareness within openQA for test distributions

action #169510: Improve non-transactional creation of Minion jobs for Git updates when restarting jobs size:M

[spike][timeboxed:10h] Explore alternatives to improve non-transactional creation of Minion jobs for Git updates when restarting job size:S

Added by mkittler about 2 months ago. Updated 21 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
2025-01-23
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: We know the best way to move forward with #169510.

Suggestions

  • Bring https://github.com/os-autoinst/openQA/pull/6048 into a mergable state making the change configurable so we can try it out in production.
    • This has the benefit of going with a solution we (almost) already have in case it succeeds. It requires only a few changes in code.
  • Move everything into one big transaction.
    • This requires some refactoring but it probably also not such a big deal. The only concern is that big transactions a generally disregarded.
      • Maybe not such a big concern as we only append rows to a few tables. So conflicts with other parallel transactions are unlikely.
      • Note that one probably needs to restart jobs in batches anyway to prevent timeouts (because also multiple smaller transactions accumulate in runtime). By using one big transaction it would at least be "all or nothing".
  • Introduce a new initial job state (perhaps just called "new"). This is maybe the most future-proof solution.
    • See how far you can come within the timebox. There is a lot to do and it would be good to know how feasible/big the different tasks are:
      • Add a new constant and extend the relevant "meta state" as well.
      • Ensure proper state transitions in all relevant code (also in error cases), e.g. let a job transition from "new" to "scheduled" only if all Minion jobs are done (or if there were no Minion jobs created after all).
      • Let the scheduler not consider those jobs yet. (This is the whole point of the idea.)
      • Update the UI (controller code, HTML templates, CSS) and API (controller code, validation helpers) to correctly handle the new state where needed.
      • Update the Python library and potentially other tooling such as openqa-monitor as needed.

Maybe it makes most sense to give each top-level suggestion a try (moving on to the next suggestion after a certain amount of time was spent, even if not done yet).

Actions #1

Updated by okurz about 1 month ago

  • Priority changed from Normal to High
Actions #2

Updated by mkittler about 1 month ago

  • Assignee set to mkittler
Actions #3

Updated by mkittler about 1 month ago

  • Status changed from Workable to In Progress
Actions #4

Updated by mkittler about 1 month ago · Edited

I updated https://github.com/os-autoinst/openQA/pull/6048. I hope CI check will pass. From a code perspective it would be quite messy to make this change configurable so I haven't done that yet. From a user/admin perspective it is probably also not that beneficial to have such a feature switch (as it is probably not easy to understand what its effect will be).

I'll create a draft for creating a new job state.
EDIT: https://github.com/os-autoinst/openQA/pull/6157

Actions #5

Updated by openqa_review about 1 month ago

  • Due date set to 2025-02-20

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by mkittler about 1 month ago · Edited

https://github.com/os-autoinst/openQA/pull/6048 (moving Minion job creation into transactions) has been merged and deployed on o3 and on OSD.

Let's see how it goes before trying to improve https://github.com/os-autoinst/openQA/pull/6157 (new job state) further. This needs further discussion anyway.

Actions #7

Updated by mkittler about 1 month ago

  • Status changed from In Progress to Feedback
Actions #8

Updated by mkittler about 1 month ago

  • Status changed from Feedback to Resolved

I enabled auto updating on OSD yesterday, see #168376#note-33. So far I haven't seen any failing Minion jobs. There are many succeeding git_clone jobs now and the needle related Minion jobs I found were also successful.

So for now I would call this approach (the first suggestion on this ticket) good enough. We can still re-open the ticket later to look into the other alternatives if we see problems later after all.

Actions #9

Updated by okurz 21 days ago

  • Due date deleted (2025-02-20)
Actions

Also available in: Atom PDF