Project

General

Profile

Actions

action #95746

closed

openQA Project (public) - coordination #102915: [saga][epic] Automated classification of failures

coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

Identify likely "sporadic" openQA tests with "openqa-investigate" size:M

Added by okurz over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Start date:
2021-07-20
Due date:
% Done:

0%

Estimated time:
40.00 h

Description

Motivation

Using openqa-investigate one can identify the root cause for failed tests. One cause could be "sporadic" openQA test issues which can (among other symptoms) be identified if the "retry" job triggered by openqa-investigate passes after the original job failed. We could feed back this information automatically to the original job as soon as the "retry" job finished

Acceptance criteria

  • AC1: There is information present on the job details page of the original failed job if an issue is likely "sporadic" or not

Suggestions

  • Extend openqa-investigate to instead of ignoring investigation jobs themselves in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L11 instead detect them, identify ":retry:" ones and provide a comment back on the original job (e.g. follow the URL in the openQA setting for OPENQA_INVESTIGATION_ORIGIN, see #95742) about the result of retry if the original issue is likely sporadic or not
  • Alternative to "openqa-investigate" would be another, dedicated script or putting that functionality directly within openQA

Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #91773: Automatic replacement of openQA job URLs preview of openQA size:MResolvedtinita2021-04-26

Actions
Related to openQA Project (public) - action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:MResolvedtinita2021-09-18

Actions
Actions #1

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #2

Updated by ilausuch over 3 years ago

  • Subject changed from Identify likely "sporadic" openQA tests with "openqa-investigate" to Identify likely "sporadic" openQA tests with "openqa-investigate" size:M
  • Status changed from New to Workable
Actions #3

Updated by ilausuch over 3 years ago

  • Status changed from Workable to In Progress
Actions #4

Updated by ilausuch over 3 years ago

  • Assignee set to ilausuch
Actions #5

Updated by openqa_review over 3 years ago

  • Due date set to 2021-09-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by ilausuch over 3 years ago

I prepared this PR following the first suggestion https://github.com/os-autoinst/scripts/pull/104

Actions #7

Updated by ilausuch over 3 years ago

  • Status changed from In Progress to Workable

I change the status to workable because I won't be working in this group for a period

Actions #8

Updated by ilausuch over 3 years ago

  • Due date deleted (2021-09-08)
  • Assignee deleted (ilausuch)
Actions #9

Updated by Xiaojing_liu over 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu
Actions #10

Updated by VANASTASIADIS over 3 years ago

  • Due date set to 2021-09-23

Setting due date based on mean cycle time of SUSE QE Tools

Actions #11

Updated by okurz over 3 years ago

  • Related to action #91773: Automatic replacement of openQA job URLs preview of openQA size:M added
Actions #12

Updated by Xiaojing_liu over 3 years ago

Committed a pull request: https://github.com/os-autoinst/scripts/pull/109
Base on this pull request we still need to add a configuration in openqa.ini, such as

job_done_hook_passed = env host=openqa.suse.de exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization).*' grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-investigate

When the investigate retry job passes, comment to the origin job.

Actions #13

Updated by okurz over 3 years ago

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Actions #14

Updated by Xiaojing_liu over 3 years ago

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

Actions #15

Updated by okurz over 3 years ago

Xiaojing_liu wrote:

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.

Actions #16

Updated by Xiaojing_liu over 3 years ago

okurz wrote:

Xiaojing_liu wrote:

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.

ok. I closed the pr in openQA and updated https://github.com/os-autoinst/scripts/pull/109

Actions #17

Updated by Xiaojing_liu over 3 years ago

  • Related to action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M added
Actions #18

Updated by Xiaojing_liu about 3 years ago

  • Status changed from In Progress to Resolved

The PR has been merged, and here is an example that it works: https://openqa.suse.de/tests/7190164#comments

Actions #19

Updated by Xiaojing_liu about 3 years ago

  • Estimated time set to 40.00 h
Actions #20

Updated by okurz about 3 years ago

  • Due date deleted (2021-09-23)
Actions

Also available in: Atom PDF