Project

General

Profile

Actions

coordination #94105

open

openQA Project - coordination #102915: [saga][epic] Automated classification of failures

[epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

Added by okurz over 3 years ago. Updated 7 months ago.

Status:
Blocked
Priority:
Normal
Assignee:
Target version:
Start date:
2021-07-20
Due date:
% Done:

85%

Estimated time:
(Total: 40.00 h)

Description

User stories

  • US1: As test writer introducing regressions in test code I want to be informed on my pull request in case my pull request is unambiguously identified as culprit for failing openQA tests so that I can quickly provide a regression fix
  • US2: As a test writer I want to be informed on my pull request in case a failing openQA test is identified to be a clear test regression with multiple git commits as candidates that likely introduced a regression so that I can crosscheck my PR if it could have caused the failure
  • US3: As a QE squad PO I would like to receive a ticket in the openQA tests issue tracker in case a failing openQA test is identified to be a clear test regression so that we can plan on fixing that issue
  • US4: As an openQA infrastructure admin I would like to receive a ticket in the openQA infrastructure issue tracker in case a failing openQA test is identified to be a clear infrastructure regression so that we can plan on fixing that issue
  • US5: As a maintenance coordination engineer creating maintenance (release) requests I want to receive a notification in case a failing openQA test is identified to be a clear product regression, i.e. if submitted changes trigger the problem, to be able to fix my submission
  • US6: As an openQA test reviewer I want openqa-investigate to automatically create auto-review tickets that handle restarting tests that fail for the same reason so that I do not need to retrigger manually while test maintainers have time to fix the sporadic issue

Suggestions

  • Incorporate https://progress.opensuse.org/projects/openqav3/wiki/#Categorization-scheme into an automatic decision tree with specific actions
  • Consider the different state results from combination of openqa-investigate jobs ("X" meaning failed, "V" passed, "O" other failure; 2^4=16 possible combination of results):

    • S0: retry X, last_good_test X, last_good_build X, last_good_test+build X -> infrastructure issue => report ticket in progress.opensuse.org/projects/openqa-infrastructure/issues/ with e.g. "Urgent" priority
    • S1: retry X, last_good_test X, last_good_build X, last_good_test+build V -> sporadic issue => see S8-15
    • S2: retry X, last_good_test X, last_good_build V, last_good_test+build X -> sporadic issue => see S8-15
    • S3: retry X, last_good_test X, last_good_build V, last_good_test+build V -> reproducible product issue => if QAM test write comment on IBS/OBS or smelt, for non-QAM report product bug
    • S4: retry X, last_good_test V, last_good_build X, last_good_test+build X -> sporadic issue => see S8-15
    • S5: retry X, last_good_test V, last_good_build X, last_good_test+build V -> reproducible test regression => bisect git log, inform on pull request, report ticket in progress.opensuse.org/projects/openqatests/issues/
    • S6: retry X, last_good_test V, last_good_build V, last_good_test+build X -> sporadic issue => see S8-15
    • S7: retry X, last_good_test V, last_good_build V, last_good_test+build V -> sporadic issue => see S8-15
    • S8-15: retry V (all 8 combinations) -> sporadic issue => automatically create auto-review tickets that handle restarting tests and retrigger original -> first step #94105
  • for sporadic issues bisect on all worker class settings in the "last_good vs. first_bad" diff except the scheduled one, e.g. for a job scheduled against qemu_x86_64 but where workers have like in last good "worker1,foo,bar" and first bad has "worker2,foo,baz" then retrigger against worker1 as well as bar and baz to check for impact of that sub-classes

  • for sporadic issues calculate fail ratio and conduct as many tests as needed to have a significant statistical number

  • Intermediate steps from weekly 2021-07-16:

    • we can always start with just writing yet another comment on openQA jobs
  • Simplified approach: If last_good+build fails, a product regression is unlikely, state a comment for that

  • If last_good_test+build fails then likely to be infrastructure issue


Subtasks 20 (3 open17 closed)

action #95742: In openqa-investigate jobs add URL to original job as settingResolvedokurz2021-07-20

Actions
action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:MResolvedXiaojing_liu2021-07-20

Actions
openQA Project - action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:MResolvedtinita2021-09-18

Actions
openQA Project - action #109920: Identify reproducible product issues using openqa-investigate size:MResolvedtinita2023-03-23

Actions
action #110176: [spike solution] [timeboxed:10h] Restart hook script in delayed minion job based on exit code size:MResolvedkraih2022-06-15

Actions
openQA Project - action #110518: Call job_done_hooks if requested by test setting (not only openQA config as done so far) size:MResolvedmkittler2021-09-18

Actions
openQA Project - action #110530: Do NOT call job_done_hooks if requested by test settingResolvedmkittler2021-09-18

Actions
openQA Project - action #112523: Make hook scripts restartable with a special exit codeResolvedkraih2022-06-15

Actions
openQA Project - action #124991: Copy ids of other investigate jobs to retry jobRejectedokurz2023-02-23

Actions
openQA Project - action #126527: [spike] Parse comments to identify reproducible product issues using openqa-investigate size:MResolvedtinita2023-03-23

Actions
openQA Project - action #132272: Identify reproducible *TEST* issues (not product issues anymore) using openqa-investigate size:MResolvedtinita

Actions
openQA Project - action #132332: Multiple investigation comments for multimachine tests size:MResolvedtinita2023-03-23

Actions
openQA Project - action #138299: Make the final aggregation messages from openqa-investigate more prominent size:SResolveddheidler

Actions
openQA Project - action #151399: Identify reproducible *infrastructure* issues using openqa-investigate size:MResolvedtinita2023-11-24

Actions
openQA Project - action #151402: [spike solution][timeboxed:20h] Allow to search for tests by comment on the UI size:MResolvedmkittler2023-11-24

Actions
openQA Project - action #152281: Schedule openQA SLE maintenance bisect jobs with lower priority same as openqa-investigateResolvedmkittler2023-12-08

Actions
openQA Project - action #152851: Notify about reproducible *infrastructure* issues using openqa-investigateNew2023-11-24

Actions
openQA Project - action #152853: Prevent faulty openQA workers causing wrong openqa-investigate conclusions size:MResolvedybonatakis

Actions
openQA Project - action #154027: [UI][UX] Allow to search for tests by comment on the UI in /testsNew

Actions
openQA Project - action #154036: Allow to search for comments or tests by comment in the UI "search" barNew

Actions

Related issues 2 (0 open2 closed)

Related to openQA Project - action #91773: Automatic replacement of openQA job URLs preview of openQA size:MResolvedtinita2021-04-26

Actions
Related to QA - action #107014: trigger openqa-trigger-bisect-jobs from our automatic investigations whenever the cause is not already known size:MResolvedtinita2022-02-17

Actions
Actions

Also available in: Atom PDF