coordination #94105
Updated by okurz over 2 years ago
## User stories
* **US1:** As test writer introducing regressions in test code I want to be informed on my pull request in case my pull request is unambiguously identified as culprit for failing openQA tests so that I can quickly provide a regression fix
* **US2:** As a test writer I want to be informed on my pull request in case a failing openQA test is identified to be a clear test regression with multiple git commits as candidates that likely introduced a regression so that I can crosscheck my PR if it could have caused the failure
* **US3:** As a QE squad PO I would like to receive a ticket in the openQA tests issue tracker in case a failing openQA test is identified to be a clear test regression so that we can plan on fixing that issue
* **US4:** As an openQA infrastructure admin I would like to receive a ticket in the openQA infrastructure issue tracker in case a failing openQA test is identified to be a clear infrastructure regression so that we can plan on fixing that issue
* **US5:** As a maintenance coordination engineer creating maintenance (release) requests I want to receive a notification in case a failing openQA test is identified to be a clear product regression, i.e. if submitted changes trigger the problem, to be able to fix my submission
* **US6:** As an openQA test reviewer I want openqa-investigate to automatically create auto-review tickets that handle restarting tests that fail for the same reason so that I do not need to retrigger manually while test maintainers have time to fix the sporadic issue
## Suggestions
* Incorporate https://progress.opensuse.org/projects/openqav3/wiki/#Categorization-scheme into an automatic decision tree with specific actions
* Consider the different state results from combination of openqa-investigate jobs ("X" meaning failed, "V" passed, "O" other failure; `2^4=16` possible combination of results):
* S0: retry X, last_good_test X, last_good_build X, last_good_test+build X -> infrastructure issue => report ticket in progress.opensuse.org/projects/openqa-infrastructure/issues/ with e.g. "Urgent" priority
* S1: retry X, last_good_test X, last_good_build X, last_good_test+build V -> sporadic issue => see S9-15
* S2: retry X, last_good_test X, last_good_build V, last_good_test+build X -> sporadic issue => see S9-15
* S3: retry X, last_good_test X, last_good_build V, last_good_test+build V -> reproducible product issue => if QAM test write comment on IBS/OBS or smelt, for non-QAM report product bug
* S4: retry X, last_good_test V, last_good_build X, last_good_test+build X -> sporadic issue => see S9-15
* S5: retry X, last_good_test V, last_good_build X, last_good_test+build V -> reproducible test regression => bisect git log, inform on pull request, report ticket in progress.opensuse.org/projects/openqatests/issues/
* S6: retry X, last_good_test V, last_good_build V, last_good_test+build X -> sporadic issue => see S9-15
* S7: retry X, last_good_test V, last_good_build V, last_good_test+build V -> sporadic issue => see S9-15
* S8-15: retry V (all 8 combinations) -> sporadic issue => automatically create auto-review tickets that handle restarting tests and retrigger original -> first step #94105
* Intermediate steps from weekly 2021-07-16:
* we can always start with just writing yet another comment on openQA jobs
* Simplified approach: If last_good+build fails, a product regression is unlikely, state a comment for that
* If last_good_test+build fails then likely to be infrastructure issue