action #98862
coordination #102915: [saga][epic] Automated classification of failures
QA - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M
0%
Description
Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes
User story¶
When a job fails, we will investigate the job by triggering a new job with different conditions, such as re-run, re-run with the last good build, or last commit in the test distribution.
If a retry job passes, the original job may be a intermittent/sporadic case.
We should provide that assessment as a comment back on the original job.
Acceptance criteria¶
- AC1: If an openqa-investigate retry job passes, a comment is created on the original job with the assessment that the issue is intermittent or a sporadic test issue
Suggestions¶
- Call the hook script with a setting to run the hook on passed jobs, e.g.
_TRIGGER_JOB_DONE_HOOK=0
or=1
- Consider exiting early from openqa-label-known-issues to ensure it does not handle passed jobs
- Take a look at how we identify likely sporadic issues as a result of failed "retry" jobs in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L136=
- Extend to react according on passed retry
- Ensure that hook scripts in github.com/os-autoinst/scripts/ act as one should expect when called on passed jobs
- Alternative: Extend o3 config to also run job_done_hooks on passed jobs and monitor performance impact
- Same on osd as on o3
Related issues
History
#1
Updated by Xiaojing_liu over 1 year ago
- Related to action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:M added
#2
Updated by okurz over 1 year ago
- Subject changed from Comment back to the origin job according to the investigate job' result to Comment back to the origin job according to the investigate job's result
- Category set to Feature requests
- Target version set to future
- Parent task set to #94105
thanks. Let's collect a bit of experience first with reacting on failed jobs as done in #95746 before coming back here reacting on passed jobs.
#3
Updated by okurz about 1 year ago
- Subject changed from Comment back to the origin job according to the investigate job's result to Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes
- Description updated (diff)
- Target version changed from future to Ready
#4
Updated by okurz about 1 year ago
- Copied to action #110518: Call job_done_hooks if requested by test setting (not only openQA config as done so far) size:M added
#5
Updated by okurz about 1 year ago
- Description updated (diff)
- Status changed from New to Blocked
- Assignee set to okurz
blocked by #110518
#6
Updated by mkittler about 1 year ago
- Status changed from Blocked to New
- Assignee deleted (
okurz)
#7
Updated by okurz about 1 year ago
With #110518 resolved we can now trigger investigation jobs within openqa-investigate with the new test setting to trigger hooks on itself so that we can evaluate the result and comment accordingly on the original job.
#8
Updated by okurz about 1 year ago
- Target version changed from Ready to future
#11
Updated by cdywan 10 months ago
- Subject changed from Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes to Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M
- Description updated (diff)
- Status changed from New to Workable
#14
Updated by tinita 6 months ago
Step 1: https://github.com/os-autoinst/scripts/pull/191 - investigate: Skip passed tests
TODO:
- Step 2: add _TRIGGER_JOB_DONE_HOOK=1 for investigate jobs
- Step 3: Post comment for passed investigate jobs
#15
Updated by openqa_review 6 months ago
- Due date set to 2023-01-05
Setting due date based on mean cycle time of SUSE QE Tools
#19
Updated by tinita 5 months ago
https://github.com/os-autoinst/scripts/pull/200 - Make hook script easier to test
#20
Updated by tinita 5 months ago
Unrelated, but I stumbled over the reused variable name: https://github.com/os-autoinst/scripts/pull/202 - Reuse job_data in openqa-investigate
#21
Updated by tinita 5 months ago
https://github.com/os-autoinst/scripts/pull/200 - Make hook script easier to test (merged)
https://github.com/os-autoinst/scripts/pull/201 - Make wrapper scripts for batch processing (merged)
https://github.com/os-autoinst/scripts/pull/202 - Reuse job_data in openqa-investigate (merged)
https://github.com/os-autoinst/scripts/pull/203 - Fix unbound variable error (fix for 203) (merged)
Currently monitoring gru logs...
#24
Updated by tinita 5 months ago
Fix regression: https://github.com/os-autoinst/scripts/pull/205
#25
Updated by tinita 5 months ago
and another fix: https://github.com/os-autoinst/scripts/pull/206
#26
Updated by tinita 5 months ago
Ready: https://github.com/os-autoinst/scripts/pull/204 - Post comment about sporadic failure
#27
Updated by okurz 5 months ago
- Due date deleted (
2023-01-27) - Status changed from In Progress to Resolved
https://github.com/os-autoinst/scripts/pull/204 is merged. Verification in production in https://openqa.opensuse.org/tests/3059836#comments looks good. I assume we will find out in the parent ticket or related tickets in case the hook script does not act as expected. As we did not change the configuration server specific for that I assume we can resolve this right away now to also keep well within the second-time extended due-date.
#29
Updated by tinita 5 months ago
- Status changed from In Progress to Feedback
Now https://github.com/os-autoinst/scripts/pull/193 (Set _TRIGGER_JOB_DONE_HOOK=1 for investigation jobs) is ready for review
#31
Updated by tinita 4 months ago
I checked passed retry jobs and it seems like the hook is not working:
https://openqa.opensuse.org/tests/3068720#settings
This has the correct setting _TRIGGER_JOB_DONE_HOOK
but apparently it didn't trigger the hook. Will check minion logs/db if there is more information.
#32
Updated by tinita 4 months ago
I created a job myself with the setting, and it did't appear in the minion dashboard at all:
https://openqa.opensuse.org/tests/3069790
So the logic that runs the hook in lib/OpenQA/Task/Job/FinalizeResults.pm
might be broken. Will verify locally what's happening.
#34
Updated by tinita 4 months ago
Ok, it worked on o3:
https://openqa.opensuse.org/tests/3070473#comment-382171
https://openqa.opensuse.org/tests/3070359#comment-382131
So I enabled it on osd too.
#36
Updated by tinita 4 months ago
okurz wrote:
You must do that with salt though to make permanent
oops.
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/789
#37
Updated by tinita 4 months ago
Also worked on osd: https://openqa.suse.de/tests/10385488#comment-732429
Only waiting for https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/789 to be merged
#40
Updated by tinita 4 months ago
- Related to action #124274: openQA reports non-sporadic issue when retry job just softfailed size:M added