action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:M - QA (public) - openSUSE Project Management Tool

Actions

action #95746

closed

openQA Project (public) - coordination #102915: [saga][epic] Automated classification of failures

coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

Identify likely "sporadic" openQA tests with "openqa-investigate" size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Target version:

openQA Project (public) - Ready

Start date:

2021-07-20

Due date:

% Done:

0%

Estimated time:

40.00 h

Description

Motivation¶

Using openqa-investigate one can identify the root cause for failed tests. One cause could be "sporadic" openQA test issues which can (among other symptoms) be identified if the "retry" job triggered by openqa-investigate passes after the original job failed. We could feed back this information automatically to the original job as soon as the "retry" job finished

Acceptance criteria¶

AC1: There is information present on the job details page of the original failed job if an issue is likely "sporadic" or not

Suggestions¶

Extend openqa-investigate to instead of ignoring investigation jobs themselves in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L11 instead detect them, identify ":retry:" ones and provide a comment back on the original job (e.g. follow the URL in the openQA setting for OPENQA_INVESTIGATION_ORIGIN, see #95742) about the result of retry if the original issue is likely sporadic or not
Alternative to "openqa-investigate" would be another, dedicated script or putting that functionality directly within openQA

Related issues 2 (0 open — 2 closed)

Actions

#1

Updated by okurz over 3 years ago

Description updated (diff)

Actions

#2

Updated by ilausuch over 3 years ago

Subject changed from Identify likely "sporadic" openQA tests with "openqa-investigate" to Identify likely "sporadic" openQA tests with "openqa-investigate" size:M
Status changed from New to Workable

Actions

#3

Updated by ilausuch over 3 years ago

Status changed from Workable to In Progress

Actions

#4

Updated by ilausuch over 3 years ago

Assignee set to ilausuch

Actions

#5

Updated by openqa_review over 3 years ago

Due date set to 2021-09-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions

#6

Updated by ilausuch over 3 years ago

I prepared this PR following the first suggestion https://github.com/os-autoinst/scripts/pull/104

Actions

#7

Updated by ilausuch over 3 years ago

Status changed from In Progress to Workable

I change the status to workable because I won't be working in this group for a period

Actions

#8

Updated by ilausuch over 3 years ago

Due date deleted (~~2021-09-08~~)
Assignee deleted (~~ilausuch~~)

Actions

#9

Updated by Xiaojing_liu over 3 years ago

Status changed from Workable to In Progress
Assignee set to Xiaojing_liu

Actions

#10

Updated by VANASTASIADIS over 3 years ago

Due date set to 2021-09-23

Setting due date based on mean cycle time of SUSE QE Tools

Actions

#11

Updated by okurz over 3 years ago

Related to action #91773: Automatic replacement of openQA job URLs preview of openQA size:M added

Actions

#12

Updated by Xiaojing_liu over 3 years ago

Committed a pull request: https://github.com/os-autoinst/scripts/pull/109
Base on this pull request we still need to add a configuration in openqa.ini, such as

job_done_hook_passed = env host=openqa.suse.de exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization).*' grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-investigate

When the investigate retry job passes, comment to the origin job.

Actions

#13

Updated by okurz over 3 years ago

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Actions

#14

Updated by Xiaojing_liu over 3 years ago

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

Actions

#15

Updated by okurz over 3 years ago

Xiaojing_liu wrote:

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.

Actions

#16

Updated by Xiaojing_liu over 3 years ago

okurz wrote:

Xiaojing_liu wrote:

okurz wrote:

hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?

Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.

That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.

ok. I closed the pr in openQA and updated https://github.com/os-autoinst/scripts/pull/109

Actions

#17

Updated by Xiaojing_liu over 3 years ago

Related to action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M added

Actions

#18

Updated by Xiaojing_liu over 3 years ago

Status changed from In Progress to Resolved

The PR has been merged, and here is an example that it works: https://openqa.suse.de/tests/7190164#comments

Actions

#19

Updated by Xiaojing_liu over 3 years ago

Estimated time set to 40.00 h

Actions

#20

Updated by okurz over 3 years ago

Due date deleted (~~2021-09-23~~)

Actions

Also available in: Atom PDF