action #77887

Updated by okurz 10 months ago

## Motivation

SUSE QEM is challenged by a relative high false-positive rate of openQA tests. This is due to objectively higher product quality of released products in comparison to products in development, i.e. pre-GM SLE including Tumbleweed snapshots before release. We already use "openqa-investigate" for o3 which has been running there for multiple months and I received positive feedback. We can now extend the solution to osd.

## Acceptance criteria
* **AC1:** Automatic investigation jobs are triggered and commented on new, unlabeled failures within osd production groups, e.g. not development groups
* **AC2:** Automatic investigation jobs run within a reasonable time to provide useful feedback to reviewers in their regular review routines
* **AC3:** No harmful performance impact on infrastructure due to too many automatic investigation jobs

## Suggestions
* There is a script based solution "openqa-investigate" which we already use automatically within o3, see
* Try out dry-runs of "openqa-investigate" against OSD and check for obvious things missing or going wrong, e.g. immediate errors or crashes or unreasonable results
* Extending to osd can be just as simple as applying a similar block in .gitlab-ci.yml for osd
* When activated monitor over couple of days for usefulness and performance impact
* Consider changing the schedule of the scheduled pipeline, e.g. trigger more often over the day, or even "continuous" :)
* Optional: Ensure that auto-review walks first over all issues, potentially even failed ones to detect known issues, and if unknown to auto-review, only then trigger investigation jobs