coordination #94105
openopenQA Project - coordination #102915: [saga][epic] Automated classification of failures
[epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Description
User stories¶
- US1: As test writer introducing regressions in test code I want to be informed on my pull request in case my pull request is unambiguously identified as culprit for failing openQA tests so that I can quickly provide a regression fix
- US2: As a test writer I want to be informed on my pull request in case a failing openQA test is identified to be a clear test regression with multiple git commits as candidates that likely introduced a regression so that I can crosscheck my PR if it could have caused the failure
- US3: As a QE squad PO I would like to receive a ticket in the openQA tests issue tracker in case a failing openQA test is identified to be a clear test regression so that we can plan on fixing that issue
- US4: As an openQA infrastructure admin I would like to receive a ticket in the openQA infrastructure issue tracker in case a failing openQA test is identified to be a clear infrastructure regression so that we can plan on fixing that issue
- US5: As a maintenance coordination engineer creating maintenance (release) requests I want to receive a notification in case a failing openQA test is identified to be a clear product regression, i.e. if submitted changes trigger the problem, to be able to fix my submission
- US6: As an openQA test reviewer I want openqa-investigate to automatically create auto-review tickets that handle restarting tests that fail for the same reason so that I do not need to retrigger manually while test maintainers have time to fix the sporadic issue
Suggestions¶
- Incorporate https://progress.opensuse.org/projects/openqav3/wiki/#Categorization-scheme into an automatic decision tree with specific actions
Consider the different state results from combination of openqa-investigate jobs ("X" meaning failed, "V" passed, "O" other failure;
2^4=16
possible combination of results):- S0: retry X, last_good_test X, last_good_build X, last_good_test+build X -> infrastructure issue => report ticket in progress.opensuse.org/projects/openqa-infrastructure/issues/ with e.g. "Urgent" priority
- S1: retry X, last_good_test X, last_good_build X, last_good_test+build V -> sporadic issue => see S8-15
- S2: retry X, last_good_test X, last_good_build V, last_good_test+build X -> sporadic issue => see S8-15
- S3: retry X, last_good_test X, last_good_build V, last_good_test+build V -> reproducible product issue => if QAM test write comment on IBS/OBS or smelt, for non-QAM report product bug
- S4: retry X, last_good_test V, last_good_build X, last_good_test+build X -> sporadic issue => see S8-15
- S5: retry X, last_good_test V, last_good_build X, last_good_test+build V -> reproducible test regression => bisect git log, inform on pull request, report ticket in progress.opensuse.org/projects/openqatests/issues/
- S6: retry X, last_good_test V, last_good_build V, last_good_test+build X -> sporadic issue => see S8-15
- S7: retry X, last_good_test V, last_good_build V, last_good_test+build V -> sporadic issue => see S8-15
- S8-15: retry V (all 8 combinations) -> sporadic issue => automatically create auto-review tickets that handle restarting tests and retrigger original -> first step #94105
for sporadic issues bisect on all worker class settings in the "last_good vs. first_bad" diff except the scheduled one, e.g. for a job scheduled against qemu_x86_64 but where workers have like in last good "worker1,foo,bar" and first bad has "worker2,foo,baz" then retrigger against worker1 as well as bar and baz to check for impact of that sub-classes
for sporadic issues calculate fail ratio and conduct as many tests as needed to have a significant statistical number
Intermediate steps from weekly 2021-07-16:
- we can always start with just writing yet another comment on openQA jobs
Simplified approach: If last_good+build fails, a product regression is unlikely, state a comment for that
If last_good_test+build fails then likely to be infrastructure issue