action #95746
closed
openQA Project (public) - coordination #102915: [saga][epic] Automated classification of failures
coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Identify likely "sporadic" openQA tests with "openqa-investigate" size:M
Added by okurz over 3 years ago.
Updated about 3 years ago.
Description
Motivation¶
Using openqa-investigate one can identify the root cause for failed tests. One cause could be "sporadic" openQA test issues which can (among other symptoms) be identified if the "retry" job triggered by openqa-investigate passes after the original job failed. We could feed back this information automatically to the original job as soon as the "retry" job finished
Acceptance criteria¶
- AC1: There is information present on the job details page of the original failed job if an issue is likely "sporadic" or not
Suggestions¶
- Extend openqa-investigate to instead of ignoring investigation jobs themselves in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L11 instead detect them, identify ":retry:" ones and provide a comment back on the original job (e.g. follow the URL in the openQA setting for OPENQA_INVESTIGATION_ORIGIN, see #95742) about the result of retry if the original issue is likely sporadic or not
- Alternative to "openqa-investigate" would be another, dedicated script or putting that functionality directly within openQA
- Description updated (diff)
- Subject changed from Identify likely "sporadic" openQA tests with "openqa-investigate" to Identify likely "sporadic" openQA tests with "openqa-investigate" size:M
- Status changed from New to Workable
- Status changed from Workable to In Progress
- Due date set to 2021-09-08
Setting due date based on mean cycle time of SUSE QE Tools
- Status changed from In Progress to Workable
I change the status to workable because I won't be working in this group for a period
- Due date deleted (
2021-09-08)
- Assignee deleted (
ilausuch)
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
- Due date set to 2021-09-23
Setting due date based on mean cycle time of SUSE QE Tools
- Related to action #91773: Automatic replacement of openQA job URLs preview of openQA size:M added
Committed a pull request: https://github.com/os-autoinst/scripts/pull/109
Base on this pull request we still need to add a configuration in openqa.ini, such as
job_done_hook_passed = env host=openqa.suse.de exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization).*' grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-investigate
When the investigate retry job passes, comment to the origin job.
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
okurz wrote:
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.
Xiaojing_liu wrote:
okurz wrote:
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.
That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.
okurz wrote:
Xiaojing_liu wrote:
okurz wrote:
hm, executing job hook scripts also for every passed job might cause some significant additional load. In that case how about turning it around and only acting on failed retry jobs to mark the original job as likely not sporadic because the retry reproduces a failure?
Another pr for doing this within openQA directly: https://github.com/os-autoinst/openQA/pull/4206
So it doesn't need to execute job hook script for every passed job.
That would still mean additional work in a minion. I suggested to act on failed jobs because we already anyway spawn hook scripts for every failed job.
ok. I closed the pr in openQA and updated https://github.com/os-autoinst/scripts/pull/109
- Related to action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M added
- Status changed from In Progress to Resolved
- Estimated time set to 40.00 h
- Due date deleted (
2021-09-23)
Also available in: Atom
PDF