Project

General

Profile

action #95746

openQA Project - coordination #102915: [saga][epic] Automated classification of failures

coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

Identify likely "sporadic" openQA tests with "openqa-investigate" size:M

Added by okurz 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2021-07-20
Due date:
% Done:

0%

Estimated time:
40.00 h

Description

Motivation

Using openqa-investigate one can identify the root cause for failed tests. One cause could be "sporadic" openQA test issues which can (among other symptoms) be identified if the "retry" job triggered by openqa-investigate passes after the original job failed. We could feed back this information automatically to the original job as soon as the "retry" job finished

Acceptance criteria

  • AC1: There is information present on the job details page of the original failed job if an issue is likely "sporadic" or not

Suggestions

  • Extend openqa-investigate to instead of ignoring investigation jobs themselves in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L11 instead detect them, identify ":retry:" ones and provide a comment back on the original job (e.g. follow the URL in the openQA setting for OPENQA_INVESTIGATION_ORIGIN, see #95742) about the result of retry if the original issue is likely sporadic or not
  • Alternative to "openqa-investigate" would be another, dedicated script or putting that functionality directly within openQA

Related issues

Related to openQA Project - action #91773: Automatic replacement of openQA job URLs preview of openQA size:MResolved2021-04-26

Related to openQA Project - action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passesNew2021-09-18

History

#1 Updated by okurz 11 months ago

  • Description updated (diff)

#2 Updated by ilausuch 11 months ago

  • Subject changed from Identify likely "sporadic" openQA tests with "openqa-investigate" to Identify likely "sporadic" openQA tests with "openqa-investigate" size:M
  • Status changed from New to Workable

#3 Updated by ilausuch 10 months ago

  • Status changed from Workable to In Progress

#4 Updated by ilausuch 10 months ago

  • Assignee set to ilausuch

#5 Updated by openqa_review 10 months ago

  • Due date set to 2021-09-08

Setting due date based on mean cycle time of SUSE QE Tools

#6 Updated by ilausuch 10 months ago

I prepared this PR following the first suggestion https://github.com/os-autoinst/scripts/pull/104

#7 Updated by ilausuch 10 months ago

  • Status changed from In Progress to Workable

I change the status to workable because I won't be working in this group for a period

#8 Updated by ilausuch 10 months ago

  • Due date deleted (2021-09-08)
  • Assignee deleted (ilausuch)

#9 Updated by Xiaojing_liu 10 months ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu

#10 Updated by VANASTASIADIS 10 months ago

  • Due date set to 2021-09-23

Setting due date based on mean cycle time of SUSE QE Tools

#11 Updated by okurz 10 months ago

  • Related to action #91773: Automatic replacement of openQA job URLs preview of openQA size:M added

#12 Updated by Xiaojing_liu 10 months ago

Committed a pull request: https://github.com/os-autoinst/scripts/pull/109
Base on this pull request we still need to add a configuration in openqa.ini, such as

job_done_hook_passed = env host=openqa.suse.de exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization).*' grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-investigate

When the investigate retry job passes, comment to the origin job.

#13 Updated by okurz 10 months ago

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

#14 Updated by Xiaojing_liu 9 months ago

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

#15 Updated by okurz 9 months ago

Xiaojing_liu wrote:

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.

#16 Updated by Xiaojing_liu 9 months ago

okurz wrote:

Xiaojing_liu wrote:

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.

ok. I closed the pr in openQA and updated https://github.com/os-autoinst/scripts/pull/109

#17 Updated by Xiaojing_liu 9 months ago

  • Related to action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes added

#18 Updated by Xiaojing_liu 9 months ago

  • Status changed from In Progress to Resolved

The PR has been merged, and here is an example that it works: https://openqa.suse.de/tests/7190164#comments

#19 Updated by Xiaojing_liu 9 months ago

  • Estimated time set to 40.00 h

#20 Updated by okurz 9 months ago

  • Due date deleted (2021-09-23)

Also available in: Atom PDF