action #97118


coordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

enhance bot automatic approval: check multiple days

Added by mgrifalconi almost 3 years ago. Updated about 1 month ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


Right now the bot will approve an incident only if all tests that include such incident are green at a certain point in time.

Day 1 after incident creation: Test 1 to 10 are green, test 11 fails.
Day 2 after incident creation: Test 1 fails, test 2 to 11 are green.
Day 3 after incident creation: Test 1 is green, test 2 fails, test 3 to 11 are green.

You get the point.
Update is likely fine since all tests were green at least once with the same update code.

While we always work to improve tests stability, it is tricky to find a time where "everything is green" due to test development, new updates that break aggregate runs, infra hiccups etc.

Right now it is slow manual work to crosscheck such results and very error prone.

The bot can check between multiple days to make sure every test was green at least one and approve the update.

Implementation proposal:
Same logic that checks latest run, but if a failure is found, go back in history and check if there was a green result up until incident creation.

Related issues 2 (1 open1 closed)

Related to QA - action #104209: [qem] checkpoints for aggregatesRejectedokurz2021-12-21

Related to openQA Project - action #122296: Fix openqa-trigger-bisect-jobs to actually remove incidents from bisection tests againNewszarate2024-02-13

Actions #1

Updated by okurz almost 3 years ago

  • Project changed from openQA Project to QA
  • Target version set to future
Actions #2

Updated by dzedro over 2 years ago

I would lock the result on dashboard on passed job group for the update, add check box or color to make it visible and here we are done.
Now we have constant overlapping and restarting of failures and waiting for results.

Actions #3

Updated by kraih over 2 years ago

The required data to implement this should be present in the dashboard already. It could provide a new REST API endpoint for the bot.

Actions #4

Updated by okurz over 2 years ago

  • Related to action #104209: [qem] checkpoints for aggregates added
Actions #5

Updated by okurz over 2 years ago

  • Parent task set to #80194
Actions #6

Updated by okurz over 1 year ago

The best idea I have currently is: If we trigger the aggregate tests with no obsoletion, i.e. let jobs in an old build continue and not cancel when a new build is triggered, then we could only look at the list finished build as does for years. I think this would effectively achieve the original goal with the advantage of being easier to implement, easy to understand. The requirement for this to work is enough resources to be able to finish one complete build within a day but if that is a challenge then we have a different problem anyway ;)

Originally posted in #104209#note-14

Actions #7

Updated by mgrifalconi over 1 year ago

@okurz I am not sure I understand what you mean. Most aggregate jobs (if not all) do finish in time before the next day run, so obsolescence should not be a factor.

Actions #8

Updated by okurz over 1 year ago

mgrifalconi wrote:

@okurz I am not sure I understand what you mean. Most aggregate jobs (if not all) do finish in time before the next day run, so obsolescence should not be a factor.

Well, the problem are for sure not the initially triggered jobs but any retries which are likely triggered manually by reviewers, right? What other reasons could there be that there are unhandled, unignored, unfinished jobs except that test maintainers still keep too many unstable tests?

Actions #9

Updated by mgrifalconi over 1 year ago

I am still not convinced that such change could have a major impact. I would focus on failed(and not manually soft-failed) jobs, instead of unfinished. If a test is still running at end of day, would meant to me that it went trough a crazy number of restarts or it is way too long.

The situation that I propose to solve is like what I tried to explain in the ticket description, here a more detailed situation:

Update1 need Test1 and Test2 green, since they are both aggregate for same SLE version

Day1: Test1 is green (on first try or maybe after a retry, that finishes well before end of day). Test 2 fails. =>Not approved
Day2: Test1 fails. Test2 is green. => Not approved

I would like the bot on Day2 to look at the history and since on Day1, Test1 was ok for Update1, approve Update1.
This is exactly what was happening during openQA review task, manually, before we decided to be more strict and reduce manual force approvals.

I guess this could be archived as proposed by Jozef on #2

This would also benefit squads, allowing more time on fixing,improving tests instead of softfailing or asking for manual approvals.

Actions #10

Updated by okurz over 1 year ago

Discussed with coolo and the following day with hrommel1 and mgrifalconi. Additional ideas we can follow:

  1. Run all aggregate tests specific per incident as an experiment to see if it really exhausts our hardware ressources. okurz assumes that this is not a problem and needs prove
  2. It seems the aggregate openqa investigate jobs "without_X" actually does not remove X as the repo is already included as part of the image generation job. os-autoinst-distri-opensuse needs to be changed to remove the repo and any packages that were installed from it -> #122296
  3. qem-dashboard seems to have sometimes inconsistent results. Maybe instead of relying on the content in the dashboard database we should change places like to look at the most recent results in openQA directly -> #122311
  4. Extend to not only look at the last failure but look if maybe a more recent result of openQA is passed and then still accept. And also go back in time to look if the result from the past days including the incident was ok. This would fulfill the original feature request

EDIT: We discussed this within SUSE QE Tools and we encountered one additional question:

(Oliver Kurz) @Stephan Kulow for the problems of SLE maintenance update review and needing a dashboard, why don't we have the problem for or in particular: Why do people feel the need to forcefully approve SLE maintenance updates but in the Factory staging workflow it works that people really need to have passing openQA tests to accept?
(Stephan Kulow)

  1. product failures in one staging project don't affect other staging projects (no aggregation)
  2. Dimstar will have no problem reverting test changes or test additions if it affects staging projects, while SLE reviewers are still very reluctant to pull that card
  3. staging openqa tests have a very minimal (and as such more stable) coverage (Oliver Kurz) Regarding 1. but still every staging project can have multiple submissions, same like "all current incidents"

So my hypothesis is: If SLE maintenance updates would use the staging workflow we wouldn't have to maintain multiple solutions and for everything else (point 2.+3.) as mentioned we should assume that the same solutions in process would need to be applied.

Looking into for the most recent run of "approve" we found more problems:

2022-12-21 13:34:16 INFO     Job 1967173 not found 
2022-12-21 13:34:16 INFO     Job 1967169 not found 
2022-12-21 13:34:16 INFO     Found failed, not-ignored job 57268 for incident 27251
2022-12-21 13:34:16 INFO     Inc 27251 has at least one failed job in aggregate tests
2022-12-21 13:34:16 INFO     Found failed, not-ignored job 1967179 for incident 27252

-> #122308

Actions #11

Updated by okurz over 1 year ago

  • Related to action #122296: Fix openqa-trigger-bisect-jobs to actually remove incidents from bisection tests again added
Actions #12

Updated by mgrifalconi 3 months ago

  • Status changed from New to In Progress
  • Assignee set to mgrifalconi
Actions #13

Updated by livdywan 3 months ago

  • Status changed from In Progress to Feedback
Actions #14

Updated by okurz about 1 month ago

  • Status changed from Feedback to Resolved is effective and no major issues have been found so far. Considered resolved.


Also available in: Atom PDF