QA (public)

Target version:

future

Start date:

2021-07-20

Due date:

% Done:

82%

Estimated time:

(Total: 40.00 h)

Description

User stories¶

US1: As test writer introducing regressions in test code I want to be informed on my pull request in case my pull request is unambiguously identified as culprit for failing openQA tests so that I can quickly provide a regression fix
US2: As a test writer I want to be informed on my pull request in case a failing openQA test is identified to be a clear test regression with multiple git commits as candidates that likely introduced a regression so that I can crosscheck my PR if it could have caused the failure
US3: As a QE squad PO I would like to receive a ticket in the openQA tests issue tracker in case a failing openQA test is identified to be a clear test regression so that we can plan on fixing that issue
US4: As an openQA infrastructure admin I would like to receive a ticket in the openQA infrastructure issue tracker in case a failing openQA test is identified to be a clear infrastructure regression so that we can plan on fixing that issue
US5: As a maintenance coordination engineer creating maintenance (release) requests I want to receive a notification in case a failing openQA test is identified to be a clear product regression, i.e. if submitted changes trigger the problem, to be able to fix my submission
US6: As an openQA test reviewer I want openqa-investigate to automatically create auto-review tickets that handle restarting tests that fail for the same reason so that I do not need to retrigger manually while test maintainers have time to fix the sporadic issue

Suggestions¶

Incorporate https://progress.opensuse.org/projects/openqav3/wiki/#Categorization-scheme into an automatic decision tree with specific actions
Consider the different state results from combination of openqa-investigate jobs ("X" meaning failed, "V" passed, "O" other failure; 2^4=16 possible combination of results):
S0: retry X, last_good_test X, last_good_build X, last_good_test+build X -> infrastructure issue => report ticket in progress.opensuse.org/projects/openqa-infrastructure/issues/ with e.g. "Urgent" priority
S1: retry X, last_good_test X, last_good_build X, last_good_test+build V -> sporadic issue => see S8-15
S2: retry X, last_good_test X, last_good_build V, last_good_test+build X -> sporadic issue => see S8-15
S3: retry X, last_good_test X, last_good_build V, last_good_test+build V -> reproducible product issue => if QAM test write comment on IBS/OBS or smelt, for non-QAM report product bug
S4: retry X, last_good_test V, last_good_build X, last_good_test+build X -> sporadic issue => see S8-15
S5: retry X, last_good_test V, last_good_build X, last_good_test+build V -> reproducible test regression => bisect git log, inform on pull request, report ticket in progress.opensuse.org/projects/openqatests/issues/
S6: retry X, last_good_test V, last_good_build V, last_good_test+build X -> sporadic issue => see S8-15
S7: retry X, last_good_test V, last_good_build V, last_good_test+build V -> sporadic issue => see S8-15
S8-15: retry V (all 8 combinations) -> sporadic issue => automatically create auto-review tickets that handle restarting tests and retrigger original -> first step #94105
for sporadic issues bisect on all worker class settings in the "last_good vs. first_bad" diff except the scheduled one, e.g. for a job scheduled against qemu_x86_64 but where workers have like in last good "worker1,foo,bar" and first bad has "worker2,foo,baz" then retrigger against worker1 as well as bar and baz to check for impact of that sub-classes
for sporadic issues calculate fail ratio and conduct as many tests as needed to have a significant statistical number
Intermediate steps from weekly 2021-07-16:
we can always start with just writing yet another comment on openQA jobs
Simplified approach: If last_good+build fails, a product regression is unlikely, state a comment for that
If last_good_test+build fails then likely to be infrastructure issue

Subtasks 28 (5 open — 23 closed)

action #95742: In openqa-investigate jobs add URL to original job as setting

Resolved

2021-07-20

action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:M

Resolved

Xiaojing_liu

2021-07-20

openQA Project (public) - action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M

Resolved

2021-09-18

openQA Project (public) - action #109920: Identify reproducible product issues using openqa-investigate size:M

Resolved

2023-03-23

action #110176: [spike solution] [timeboxed:10h] Restart hook script in delayed minion job based on exit code size:M

Resolved

kraih

2022-06-15

openQA Project (public) - action #110518: Call job_done_hooks if requested by test setting (not only openQA config as done so far) size:M

Resolved

2021-09-18

openQA Project (public) - action #110530: Do NOT call job_done_hooks if requested by test setting

Resolved

2021-09-18

openQA Project (public) - action #112523: Make hook scripts restartable with a special exit code

Resolved

kraih

2022-06-15

openQA Project (public) - action #124991: Copy ids of other investigate jobs to retry job

Rejected

2023-02-23

openQA Project (public) - action #126527: [spike] Parse comments to identify reproducible product issues using openqa-investigate size:M

Resolved

2023-03-23

openQA Project (public) - action #132272: Identify reproducible *TEST* issues (not product issues anymore) using openqa-investigate size:M

Resolved

openQA Project (public) - action #132332: Multiple investigation comments for multimachine tests size:M

Resolved

2023-03-23

openQA Project (public) - action #138299: Make the final aggregation messages from openqa-investigate more prominent size:S

Resolved

dheidler

openQA Project (public) - action #151399: Identify reproducible *infrastructure* issues using openqa-investigate size:M

Resolved

2023-11-24

openQA Project (public) - action #151402: [spike solution][timeboxed:20h] Allow to search for tests by comment on the UI size:M

Resolved

2023-11-24

openQA Project (public) - action #152281: Schedule openQA SLE maintenance bisect jobs with lower priority same as openqa-investigate

Resolved

2023-12-08

openQA Project (public) - action #152851: Notify about reproducible *infrastructure* issues using openqa-investigate

New

2023-11-24

openQA Project (public) - action #152853: Prevent faulty openQA workers causing wrong openqa-investigate conclusions size:M

Resolved

ybonatakis

openQA Project (public) - action #154027: [UI][UX] Allow to search for tests by comment on the UI in /tests

New

openQA Project (public) - action #154036: Allow to search for comments or tests by comment in the UI "search" bar

New

openQA Project (public) - action #176418: last_good_tests_and_build is not triggered even though matching worker instance seems to be free and 0 jobs running due to jobs as part of parallel clusters

Resolved

2025-02-01

openQA Project (public) - action #176730: [openqa-investigate] Investigation for job clusters creating multiple comments and producing error messages in the logs

New

2025-02-07

openQA Project (public) - action #176886: A "+" and other characters used in test names in $var are considered invalid in WORKER_CLASS:$var size:S

Resolved

openQA Project (public) - action #177267: Forbid unprintable characters, white space, colons, equal signs and quotes in TEST names size:S

Resolved

2025-02-14

openQA Project (public) - action #180218: openqa-investigate leaves temporary job comments "Starting investigation for job ..." size:S

Resolved

gpuliti

2025-04-08

openQA Project (public) - action #181421: Investigate hook_script minion jobs should detect when jobs are cancelled

New

2025-04-25

openQA Project (public) - action #181427: [easy][beginner] Run investigation hook_script minion jobs with linear backoff size:S

Resolved

ybonatakis

2025-04-25

openQA Project (public) - action #182264: [Alert] web UI: Minion jobs failed hook alert Salt minion_jobs_failed_hook_alert size:S

Resolved

2025-05-13

Related issues 2 (0 open — 2 closed)

Related to openQA Project (public) - action #91773: Automatic replacement of openQA job URLs preview of openQA size:M

Resolved

2021-04-26

Related to QA (public) - action #107014: trigger openqa-trigger-bisect-jobs from our automatic investigations whenever the cause is not already known size:M

Resolved

2022-02-17