Project

General

Profile

Actions

coordination #77899

closed

openQA Project - coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

[epic] Extend "auto-review" for failed jobs as well

Added by okurz over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2020-11-26
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Motivation

Especially SUSE QEM suffers from the workload of manually reviewing openQA test results due to the comparatively high false-positive rate (as the product is of higher quality after GM in comparison to products in development before GM). The existing scenario based "label carry-over" is much less useful for the current setup of QAM scenarios that are spread over many different job groups. With "auto-review" we have a good solution to handle known incompletes, retrigger automatically where it makes sense as well as find new, unknown incompletes easily. As "auto-review" can work regardless of the result of the job but is just depending on what list of jobs is passed, we should evaluate to extend it for handling unlabeled failed results as well.

Acceptance criteria

  • AC1: Failed openQA jobs where the log(s) match a regex specified in progress tickets with "auto_review" like for incomplete jobs are labeled with the corresponding ticket
  • AC2: No gitlab CI pipelines monitored by the team SUSE QE Tools fail if there are unlabeled unknown failed jobs encountered
  • AC3: Same for o3 and osd
  • AC4: Power users know about the feature and how it can be used

Suggestions

  • Don't fail gitlab CI pipelines in case failed jobs are not known as SUSE QE Tools can't handle that load of unreviewed, new, failed tests and should not be concerned about that
  • Start with o3 as "testbed" and extend to osd if the process on o3 runs in a convincing way
  • Consider including the solution within openQA itself, e.g. as plugin, triggering a synchronous action when a job finishes and after automatic label carry-over did not find a convincing candidate
  • Consider caching of tickets to reduce the need for recurring loading from redmine API but still ensure that ticket updates, e.g. fixed auto-review regex's, have an effect, e.g. only cache for 10s or 1m
  • Present to power users, e.g. documentation, blog article, feature video, workshop

Subtasks 4 (0 open4 closed)

action #80414: [proof-of-concept] Extend "auto-review" for failed jobs as well, start with o3Resolvedokurz2020-11-26

Actions
action #80418: [learning] Fix parse errors in "openqa-investigate" "parse error: Invalid numeric literal at line 1, column 10"Resolvedmkittler2020-11-26

Actions
action #80806: Extend "auto-review" for failed jobs as well - Generalize openqa-monitor-investigation-candidates to look at more than just one job groupResolvedokurz2020-12-07

Actions
action #80808: Extend "auto-review" for failed jobs as well - enable same as on o3 but on osdResolvedokurz2020-12-07

Actions

Related issues 1 (0 open1 closed)

Copied to QA - action #77944: Run "auto-review" more often but alarm lessResolvedokurz2020-11-14

Actions
Actions #1

Updated by okurz over 3 years ago

  • Description updated (diff)
  • Target version set to Ready
Actions #2

Updated by okurz over 3 years ago

  • Copied to action #77944: Run "auto-review" more often but alarm less added
Actions #3

Updated by okurz over 3 years ago

  • Parent task set to #39719
Actions #4

Updated by okurz over 3 years ago

  • Tracker changed from action to coordination
  • Subject changed from Extend "auto-review" for failed jobs as well to [epic] Extend "auto-review" for failed jobs as well
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by okurz over 3 years ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz

tracking both subtasks

Actions #6

Updated by okurz over 3 years ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

With both current subtasks resolved I see the proof-of-concept succesfully in place. As next steps I recommend to extend the approach to a selected product or job group on osd as well as all "non-development" job groups on o3. For this anyone can specify the next subtasks and follow on in these.

Actions #7

Updated by okurz over 3 years ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz

blocked on subtasks

Actions #8

Updated by okurz over 3 years ago

  • Status changed from Blocked to Workable

All current subtasks resolved. Latest results in #80806#note-4

  • Switching off the triggers for investigation jobs or even the complete schedule from the gitlab CI pipeline

I disabled both "daily" and "hourly" schedules on
https://gitlab.suse.de/openqa/auto-review/-/pipeline_schedules
with corresponding comments in the schedule names, e.g. "DISABLED:, replaced by job-done-hooks, see https://progress.opensuse.org/issues/77899 - hourly". Let's see if o3 and osd run fine just based on job-done-hooks

  • Check if auto-review is also correctly triggered for both o3 + osd still

I checked with

for i in o3 osd; do ssh $i "sudo -u geekotest psql openqa -c \"select jobs.id, result_dir,t_finished from comments,jobs,users where comments.user_id = users.id and comments.job_id = jobs.id and username ~ 'auto-review' order by id DESC limit 10;\""; done

and last comments where "auto-review" had to comment was some days ago. So the process seems to work in general.

TODO

  • Update description of epic
  • Try to post comments as "auto-review", not "geekotest" with corresponding user account keys, etc. , e.g. check what user "geekotest" is doing, either just use new api-key and secret or sudo?
Actions #9

Updated by livdywan about 3 years ago

@okurz Do the ACs still need updating wrt the TODO mentioned? Do we need more subtasks for this epic? I'm asking because it has your name on it, and is Workable... but it's not clear to me if someone else could take the ticket and work on ACs right now.

Actions #10

Updated by okurz about 3 years ago

  • Status changed from Workable to Resolved

Man, your diligence and attention to detail can be annoying ;)

I crosschecked the epic and everything besided that one point I mentioned is done covering all ACs. I have mentioned the idea about the different bot user account in #65271#note-38 and we can close this ticket then

Actions

Also available in: Atom PDF