action #104209: [qem] dashboard.qam.suse.de checkpoints for aggregates - QA (public) - openSUSE Project Management Tool

Actions

action #104209

closed

coordination #99303: [saga][epic] Improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

[qem] dashboard.qam.suse.de checkpoints for aggregates

Added by hurhaj over 3 years ago. Updated over 2 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Target version:

Start date:

2021-12-21

Due date:

% Done:

0%

Estimated time:

Description

Scenario:
Update is in queue for a week, but qam-openqa is not yet approved. Looking into d.q.s.d/blocked you can see some aggregate job groups failed some still running. But checking past runs, one can see that all of them already passed at some point since the update was added.

Problem is that currently, dashboards wants everything green at the same time to approve an update. That rarely happens (let's not get sidetracked on this claim). Solution would be if dashboard could be checking only until specific aggregate job group is green for the first time and then disregard any later runs.

Related issues 2 (1 open — 1 closed)

Actions

#1

Updated by okurz over 3 years ago

Target version set to future

A valid feature request for the future. I assume that means that some component would need to store always the highest "watermark" per incident and approve as soon as the watermark exceeds a threshold

Actions

#2

Updated by mgrifalconi over 3 years ago

Hello, that's great to see more agreement on this topic, but I feel this is a duplicate of a problem already mentioned here:

https://progress.opensuse.org/issues/97274
and more in detail: https://progress.opensuse.org/issues/97118

Actions

#3

Updated by okurz over 3 years ago

Related to action #97274: qam dashboard improvement ideas added

Actions

#4

Updated by okurz over 3 years ago

Related to action #97118: enhance bot automatic approval: check multiple days added

Actions

#5

Updated by hurhaj over 3 years ago

mgrifalconi wrote:

Hello, that's great to see more agreement on this topic, but I feel this is a duplicate of a problem already mentioned here:

https://progress.opensuse.org/issues/97274

and more in detail: https://progress.opensuse.org/issues/97118

Right, seems like two similar solutions for the same problem. Anyway, I guess this issue is what's currently the biggest issue in openQA review. Not only updates are not being approved, but reviewer has to check every fail and running job groups. So I believe this feature should be planed for near future :)

Actions

#6

Updated by okurz over 3 years ago

hurhaj wrote:

So I believe this feature should be planed for near future :)

I assume that the effort to implement this properly would be pretty high because we also need traceability why releases have been approved. And if we just say "random jobs happened to pass at random times but not all together" then this will be hard to follow unless we save corresponding test results from the corresponding times that they have passed. I would be surprised if handing the actual job failures in openQA is not a better solution. There are multiple ways how to make "failed jobs disappear", among others:

Fix the actual test failure cause (always the preferred choice of course)
Implement a workaround with a soft-failure so that other test modules are still executed
Retriggering known sporadic issues, at best with auto-review
Remove failing tests (or move to the development groups)
Overwrite the results using http://open.qa/docs/#_overwrite_result_of_job
Retry of openQA jobs based on test variables

If all of that does not work I think we got a more severe problem than needing this dashboard feature. So how can I help to make people use the above or is there something else missing we can do?

EDIT: 2021-12-22 Added https://github.com/os-autoinst/openQA/pull/4422 as additional option

Actions

#7

Updated by hurhaj over 3 years ago

That's what we're doing and we are in a situation in which I decided to open this issue.

We're releasing around 70 updates (not packages) weekly. Situation in repository is changing too fast to hope for ideal situation. Meanwhile, I'm going through dozens of unapproved updates, hanging in the queue for weeks, hoping I didn't miss something.

Actions

#8

Updated by hurhaj over 3 years ago

okurz wrote:

I assume that the effort to implement this properly would be pretty high
Fully aware, I'm not expecting Christmas miracle here

because we also need traceability why releases have been approved
Maybe bot could both comment in IBS with links to passed jobs and approve?

Actions

#9

Updated by okurz over 3 years ago

hurhaj wrote:

That's what we're doing and we are in a situation in which I decided to open this issue.

It feels like dozens of QA engineers in QE still don't provide the work that is needed to stabilize unstable or false-positive tests. We try already in multiple of issues to address not only the technical parts but also the process related ones, e.g. in #96543, #95479, #91649, #103656, #102197, #101355, #101187

We're releasing around 70 updates (not packages) weekly. Situation in repository is changing too fast to hope for ideal situation. Meanwhile, I'm going through dozens of unapproved updates, hanging in the queue for weeks, hoping I didn't miss something.

That's of course not how it should be and it shouldn't be necessary that you need to cleanup the queue this way.

One other thought: Would it help to move more tests from aggregate into incident tests?

hurhaj wrote:

because we also need traceability why releases have been approved

Maybe bot could both comment in IBS with links to passed jobs and approve?

That would be a list of individual jobs as there would be no corresponding view in openQA to show the test results for incidents corresponding to different points in time.

Actions

#10

Updated by okurz over 3 years ago

Parent task set to #80194

Actions

#11

Updated by okurz over 3 years ago

Discussed with hurhaj. We agreed that with increasing number of products, pending maintenance updates and number of tests the situation of "not all tests can ever pass at the same time" is becoming more likely.

okurz wrote:

I assume that the effort to implement this properly would be pretty high because we also need traceability why releases have been approved. And if we just say "random jobs happened to pass at random times but not all together" then this will be hard to follow unless we save corresponding test results from the corresponding times that they have passed.

We agreed that this is a risk although from experience of hurhaj such cases where an after-the-fact investigation would be needed were neither not necessary, did not happen or only so seldomly that we don't need to care about it.

I would be surprised if handing the actual job failures in openQA is not a better solution. There are multiple ways how to make "failed jobs disappear", among others:

Fix the actual test failure cause (always the preferred choice of course)

Implement a workaround with a soft-failure so that other test modules are still executed

Retriggering known sporadic issues, at best with auto-review

Remove failing tests (or move to the development groups)

Overwrite the results using http://open.qa/docs/#_overwrite_result_of_job

Retry of openQA jobs based on test variables

From all of the above options only 5. "Overwrite the results using http://open.qa/docs/#_overwrite_result_of_job" would be able to conduct without reconducting tests again which has the benefit of no additional waiting time needed. I presented to hurhaj https://github.com/os-autoinst/scripts/blob/master/README.md#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger and in particular the feature to use auto-review+force-result. According to hurhaj that would not help in the following case when update X is faulty and blocks approval of Y due to aggregate tests including X+Y failing and there is no time to reject X and rerun all tests without X or the likelyhood of newly added update Z blocking approval of Y is high. The requirement here would be similar as in #95479 to mark a failed job as acceptable, i.e. same as passed, but only for Y.

Actions

#12

Updated by hurhaj over 2 years ago

As we are still fighting needlessly long openQA queue, I would like to refresh the attention to this ticket and #97118 (which is pretty much the same, but was opened sooner). Jozef wrote in #97118 possible solution. Could this be implemented?

Actions

#13

Updated by okurz over 2 years ago

I would like to help but I would like to ensure we really understand the actual problem before we go further

hurhaj wrote:

As we are still fighting needlessly long openQA queue,

What queue do you mean in particular? Do you mean queue for "qam-openqa" review in IBS and the according openQA tests that block auto-approval?

I would like to refresh the attention to this ticket and #97118 (which is pretty much the same, but was opened sooner). Jozef wrote in #97118 possible solution. Could this be implemented?

I think it's possible but it will make it even harder for reviewers to understand what's blocking auto-approval. I am expecting that such checkpoint implementation would help for a limited time until tests deteriorate even further and more random issues block maintenance updates. I have the feeling that most QE engineers working with openqa.suse.de are missing some motivation to help "make tests green". I am suspecting some people even don't know that their changes impact SLE maintenance tests. A similar thing happens e.g. for Tumbleweed but there owners, e.g. DimStar, are dilligently pointing issues out to test owners and make sure that unstable unreliable tests are removed. I don't see that commonly happening with SLE maintenance tests so I suggest to work on that.

Actions

#14

Updated by okurz over 2 years ago

One idea: If we trigger the aggregate tests with no obsoletion, i.e. let jobs in an old build continue and not cancel when a new build is triggered, then we could only look at the list finished build as https://github.com/os-autoinst/openqa_review does for years. I think this would effectively achieve the original goal with the advantage of being easier to implement, easy to understand. The requirement for this to work is enough resources to be able to finish one complete build within a day but if that is a challenge then we have a different problem anyway ;)

Actions

#15

Updated by okurz over 2 years ago

Status changed from New to Rejected
Assignee set to okurz

I mentioned the idea on #97118 and now closing this as duplicate

Actions

Also available in: Atom PDF