action #95746
closedopenQA Project (public) - coordination #102915: [saga][epic] Automated classification of failures
coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Identify likely "sporadic" openQA tests with "openqa-investigate" size:M
0%
Description
Motivation¶
Using openqa-investigate one can identify the root cause for failed tests. One cause could be "sporadic" openQA test issues which can (among other symptoms) be identified if the "retry" job triggered by openqa-investigate passes after the original job failed. We could feed back this information automatically to the original job as soon as the "retry" job finished
Acceptance criteria¶
- AC1: There is information present on the job details page of the original failed job if an issue is likely "sporadic" or not
Suggestions¶
- Extend openqa-investigate to instead of ignoring investigation jobs themselves in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L11 instead detect them, identify ":retry:" ones and provide a comment back on the original job (e.g. follow the URL in the openQA setting for OPENQA_INVESTIGATION_ORIGIN, see #95742) about the result of retry if the original issue is likely sporadic or not
- Alternative to "openqa-investigate" would be another, dedicated script or putting that functionality directly within openQA
Updated by ilausuch over 3 years ago
- Subject changed from Identify likely "sporadic" openQA tests with "openqa-investigate" to Identify likely "sporadic" openQA tests with "openqa-investigate" size:M
- Status changed from New to Workable
Updated by ilausuch over 3 years ago
- Status changed from Workable to In Progress
Updated by openqa_review over 3 years ago
- Due date set to 2021-09-08
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ilausuch over 3 years ago
I prepared this PR following the first suggestion https://github.com/os-autoinst/scripts/pull/104
Updated by ilausuch over 3 years ago
- Status changed from In Progress to Workable
I change the status to workable because I won't be working in this group for a period
Updated by ilausuch over 3 years ago
- Due date deleted (
2021-09-08) - Assignee deleted (
ilausuch)
Updated by Xiaojing_liu over 3 years ago
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
Updated by VANASTASIADIS over 3 years ago
- Due date set to 2021-09-23
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 3 years ago
- Related to action #91773: Automatic replacement of openQA job URLs preview of openQA size:M added
Updated by Xiaojing_liu about 3 years ago
Committed a pull request: https://github.com/os-autoinst/scripts/pull/109
Base on this pull request we still need to add a configuration in openqa.ini, such as
job_done_hook_passed = env host=openqa.suse.de exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization).*' grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-investigate
When the investigate retry job passes, comment to the origin job.
Updated by okurz about 3 years ago
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
Updated by Xiaojing_liu about 3 years ago
okurz wrote:
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.
Updated by okurz about 3 years ago
Xiaojing_liu wrote:
okurz wrote:
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.
That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.
Updated by Xiaojing_liu about 3 years ago
okurz wrote:
Xiaojing_liu wrote:
okurz wrote:
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.
ok. I closed the pr in openQA and updated https://github.com/os-autoinst/scripts/pull/109
Updated by Xiaojing_liu about 3 years ago
- Related to action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M added
Updated by Xiaojing_liu about 3 years ago
- Status changed from In Progress to Resolved
The PR has been merged, and here is an example that it works: https://openqa.suse.de/tests/7190164#comments