coordination #77899
closed
openQA Project (public) - coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
[epic] Extend "auto-review" for failed jobs as well
Added by okurz about 4 years ago.
Updated almost 4 years ago.
Estimated time:
(Total: 0.00 h)
Description
Motivation¶
Especially SUSE QEM suffers from the workload of manually reviewing openQA test results due to the comparatively high false-positive rate (as the product is of higher quality after GM in comparison to products in development before GM). The existing scenario based "label carry-over" is much less useful for the current setup of QAM scenarios that are spread over many different job groups. With "auto-review" we have a good solution to handle known incompletes, retrigger automatically where it makes sense as well as find new, unknown incompletes easily. As "auto-review" can work regardless of the result of the job but is just depending on what list of jobs is passed, we should evaluate to extend it for handling unlabeled failed results as well.
Acceptance criteria¶
- AC1: Failed openQA jobs where the log(s) match a regex specified in progress tickets with "auto_review" like for incomplete jobs are labeled with the corresponding ticket
- AC2: No gitlab CI pipelines monitored by the team SUSE QE Tools fail if there are unlabeled unknown failed jobs encountered
- AC3: Same for o3 and osd
- AC4: Power users know about the feature and how it can be used
Suggestions¶
- Don't fail gitlab CI pipelines in case failed jobs are not known as SUSE QE Tools can't handle that load of unreviewed, new, failed tests and should not be concerned about that
- Start with o3 as "testbed" and extend to osd if the process on o3 runs in a convincing way
- Consider including the solution within openQA itself, e.g. as plugin, triggering a synchronous action when a job finishes and after automatic label carry-over did not find a convincing candidate
- Consider caching of tickets to reduce the need for recurring loading from redmine API but still ensure that ticket updates, e.g. fixed auto-review regex's, have an effect, e.g. only cache for 10s or 1m
- Present to power users, e.g. documentation, blog article, feature video, workshop
- Description updated (diff)
- Target version set to Ready
- Copied to action #77944: Run "auto-review" more often but alarm less added
- Parent task set to #39719
- Tracker changed from action to coordination
- Subject changed from Extend "auto-review" for failed jobs as well to [epic] Extend "auto-review" for failed jobs as well
- Description updated (diff)
- Status changed from New to Workable
- Status changed from Workable to Blocked
- Assignee set to okurz
- Status changed from Blocked to Workable
- Assignee deleted (
okurz)
With both current subtasks resolved I see the proof-of-concept succesfully in place. As next steps I recommend to extend the approach to a selected product or job group on osd as well as all "non-development" job groups on o3. For this anyone can specify the next subtasks and follow on in these.
- Status changed from Workable to Blocked
- Assignee set to okurz
- Status changed from Blocked to Workable
All current subtasks resolved. Latest results in #80806#note-4
- Switching off the triggers for investigation jobs or even the complete schedule from the gitlab CI pipeline
I disabled both "daily" and "hourly" schedules on
https://gitlab.suse.de/openqa/auto-review/-/pipeline_schedules
with corresponding comments in the schedule names, e.g. "DISABLED:, replaced by job-done-hooks, see https://progress.opensuse.org/issues/77899 - hourly". Let's see if o3 and osd run fine just based on job-done-hooks
- Check if auto-review is also correctly triggered for both o3 + osd still
I checked with
for i in o3 osd; do ssh $i "sudo -u geekotest psql openqa -c \"select jobs.id, result_dir,t_finished from comments,jobs,users where comments.user_id = users.id and comments.job_id = jobs.id and username ~ 'auto-review' order by id DESC limit 10;\""; done
and last comments where "auto-review" had to comment was some days ago. So the process seems to work in general.
TODO¶
- Update description of epic
- Try to post comments as "auto-review", not "geekotest" with corresponding user account keys, etc. , e.g. check what user "geekotest" is doing, either just use new api-key and secret or sudo?
@okurz Do the ACs still need updating wrt the TODO mentioned? Do we need more subtasks for this epic? I'm asking because it has your name on it, and is Workable... but it's not clear to me if someone else could take the ticket and work on ACs right now.
- Status changed from Workable to Resolved
Man, your diligence and attention to detail can be annoying ;)
I crosschecked the epic and everything besided that one point I mentioned is done covering all ACs. I have mentioned the idea about the different bot user account in #65271#note-38 and we can close this ticket then
Also available in: Atom
PDF