action #77944
openQA Project - coordination #39719: [saga][epic] Detect "known failures" and mark jobs as such to make tests more stable, reviewing test results and tracking known issues easier
Run "auto-review" more often but alarm less
0%
Description
Motivation¶
auto-review does a good job but is at the border of annoying with too many alerts. But on the other hand we could benefit from running auto-review more often. We should consider counting the gitlab CI pipeline as passed as long as only less than N unknown, new issues show up, potentially even not fail gitlab CI pipeline at all as an alert but alarm differently, e.g. just in grafana
Acceptance criteria¶
- AC1: gitlab CI pipeline only fails if more than N unknown new issues
- AC2: auto-review pipeline runs more often
Suggestions¶
- Don't fail gitlab CI pipelines in case jobs are not known or introduce threshold N
- Adjust gitlab CI pipeline schedule
Related issues
History
#1
Updated by okurz 2 months ago
- Copied from coordination #77899: [epic] Extend "auto-review" for failed jobs as well added
#2
Updated by okurz about 2 months ago
- Estimated time set to 39719.00 h
#3
Updated by okurz about 2 months ago
- Parent task set to #39719
#4
Updated by okurz about 2 months ago
- Estimated time deleted (
39719.00 h)
#5
Updated by okurz about 2 months ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Unfortunately our users of (at least) o3 and osd are suffering from many unhandled incompletes related to #67000 so as nobody has a better clue and our users are suffering from incompletes I have enabled the "hourly" schedule in https://gitlab.suse.de/openqa/auto-review/-/pipeline_schedules . This will send more email alerts for any new unhandled incompletes which need a ticket
#6
Updated by okurz about 2 months ago
- Related to action #76912: OpenQA::Script::Client throws perl warning "Wide character in print", should not be there added
#7
Updated by okurz about 2 months ago
- Description updated (diff)
- Status changed from In Progress to Feedback
With the patch for #76912 applied already while the PR is pending the pipeline did run just fine overnight.
We can still consider preventing repeated alerts. How about we add a mode to openqa-label-known-issues
to write "unknown issue" (similar to what TTM already does as well) and/or optionally just retrigger automatically. Then, in a separate step we can still check for the number of incompletes with the text "unknown issue" or incompletes which still have no clone.
- TODO: @waitfor #76912
- TODO: consider the mentioned extensions to
openqa-label-known-issues
#8
Updated by okurz about 2 months ago
Had an idea: https://github.com/os-autoinst/openQA/pull/3603 This could also be for another ticket about "integrate auto-review more closely into openQA"
#9
Updated by okurz about 2 months ago
- Status changed from Feedback to In Progress
#10
Updated by okurz about 2 months ago
- Copied to action #80736: Trigger 'auto-review' from within openQA when jobs incomplete (or fail) , for testing: auto_review:"tests died: unable to load main.pm, check the log for the cause" added
#11
Updated by okurz about 2 months ago
- Status changed from In Progress to Resolved
https://github.com/os-autoinst/openQA/pull/3603 is merged. I plan to conduct the rollout to o3 and potentially osd in #80736
The original idea for this ticket was to run auto-review more often which works fine so far