coordination #77899: [epic] Extend "auto-review" for failed jobs as well - QA (public) - openSUSE Project Management Tool

Actions

coordination #77899

closed

openQA Project (public) - coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

[epic] Extend "auto-review" for failed jobs as well

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

okurz

Target version:

openQA Project (public) - Ready

Start date:

2020-11-26

Due date:

% Done:

100%

Estimated time:

(Total: 0.00 h)

Description

Motivation¶

Especially SUSE QEM suffers from the workload of manually reviewing openQA test results due to the comparatively high false-positive rate (as the product is of higher quality after GM in comparison to products in development before GM). The existing scenario based "label carry-over" is much less useful for the current setup of QAM scenarios that are spread over many different job groups. With "auto-review" we have a good solution to handle known incompletes, retrigger automatically where it makes sense as well as find new, unknown incompletes easily. As "auto-review" can work regardless of the result of the job but is just depending on what list of jobs is passed, we should evaluate to extend it for handling unlabeled failed results as well.

Acceptance criteria¶

AC1: Failed openQA jobs where the log(s) match a regex specified in progress tickets with "auto_review" like for incomplete jobs are labeled with the corresponding ticket
AC2: No gitlab CI pipelines monitored by the team SUSE QE Tools fail if there are unlabeled unknown failed jobs encountered
AC3: Same for o3 and osd
AC4: Power users know about the feature and how it can be used

Suggestions¶

Don't fail gitlab CI pipelines in case failed jobs are not known as SUSE QE Tools can't handle that load of unreviewed, new, failed tests and should not be concerned about that
Start with o3 as "testbed" and extend to osd if the process on o3 runs in a convincing way
Consider including the solution within openQA itself, e.g. as plugin, triggering a synchronous action when a job finishes and after automatic label carry-over did not find a convincing candidate
Consider caching of tickets to reduce the need for recurring loading from redmine API but still ensure that ticket updates, e.g. fixed auto-review regex's, have an effect, e.g. only cache for 10s or 1m
Present to power users, e.g. documentation, blog article, feature video, workshop

Subtasks 4 (0 open — 4 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz over 4 years ago

Description updated (diff)
Target version set to Ready

Actions

Copy link

Updated by okurz over 4 years ago

Copied to action #77944: Run "auto-review" more often but alarm less added

Actions

Copy link

Updated by okurz over 4 years ago

Parent task set to #39719

Actions

Copy link

Updated by okurz over 4 years ago

Tracker changed from action to coordination
Subject changed from Extend "auto-review" for failed jobs as well to [epic] Extend "auto-review" for failed jobs as well
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz over 4 years ago

Status changed from Workable to Blocked
Assignee set to okurz

tracking both subtasks

Actions

Copy link

Updated by okurz over 4 years ago

Status changed from Blocked to Workable
Assignee deleted (~~okurz~~)

With both current subtasks resolved I see the proof-of-concept succesfully in place. As next steps I recommend to extend the approach to a selected product or job group on osd as well as all "non-development" job groups on o3. For this anyone can specify the next subtasks and follow on in these.

Actions

Copy link

Updated by okurz over 4 years ago

Status changed from Workable to Blocked
Assignee set to okurz

blocked on subtasks

Actions

Copy link

Updated by okurz over 4 years ago

Status changed from Blocked to Workable

All current subtasks resolved. Latest results in #80806#note-4

Switching off the triggers for investigation jobs or even the complete schedule from the gitlab CI pipeline

I disabled both "daily" and "hourly" schedules on
https://gitlab.suse.de/openqa/auto-review/-/pipeline_schedules
with corresponding comments in the schedule names, e.g. "DISABLED:, replaced by job-done-hooks, see https://progress.opensuse.org/issues/77899 - hourly". Let's see if o3 and osd run fine just based on job-done-hooks

Check if auto-review is also correctly triggered for both o3 + osd still

I checked with

for i in o3 osd; do ssh $i "sudo -u geekotest psql openqa -c \"select jobs.id, result_dir,t_finished from comments,jobs,users where comments.user_id = users.id and comments.job_id = jobs.id and username ~ 'auto-review' order by id DESC limit 10;\""; done

and last comments where "auto-review" had to comment was some days ago. So the process seems to work in general.

TODO¶

Update description of epic
Try to post comments as "auto-review", not "geekotest" with corresponding user account keys, etc. , e.g. check what user "geekotest" is doing, either just use new api-key and secret or sudo?

Actions

Copy link

Updated by livdywan over 4 years ago

@okurz Do the ACs still need updating wrt the TODO mentioned? Do we need more subtasks for this epic? I'm asking because it has your name on it, and is Workable... but it's not clear to me if someone else could take the ticket and work on ACs right now.

Actions

Copy link

#10

Updated by okurz over 4 years ago

Status changed from Workable to Resolved

Man, your diligence and attention to detail can be annoying ;)

I crosschecked the epic and everything besided that one point I mentioned is done covering all ACs. I have mentioned the idea about the different bot user account in #65271#note-38 and we can close this ticket then

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public)

Tags

Custom queries