action #98862
closedcoordination #102915: [saga][epic] Automated classification of failures
QA - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M
Description
Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes
User story¶
When a job fails, we will investigate the job by triggering a new job with different conditions, such as re-run, re-run with the last good build, or last commit in the test distribution.
If a retry job passes, the original job may be a intermittent/sporadic case.
We should provide that assessment as a comment back on the original job.
Acceptance criteria¶
- AC1: If an openqa-investigate retry job passes, a comment is created on the original job with the assessment that the issue is intermittent or a sporadic test issue
Suggestions¶
- Call the hook script with a setting to run the hook on passed jobs, e.g.
_TRIGGER_JOB_DONE_HOOK=0
or=1
- Consider exiting early from openqa-label-known-issues to ensure it does not handle passed jobs
- Take a look at how we identify likely sporadic issues as a result of failed "retry" jobs in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L136=
- Extend to react according on passed retry
- Ensure that hook scripts in github.com/os-autoinst/scripts/ act as one should expect when called on passed jobs
- Alternative: Extend o3 config to also run job_done_hooks on passed jobs and monitor performance impact
- Same on osd as on o3
Updated by Xiaojing_liu about 3 years ago
- Related to action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:M added
Updated by okurz about 3 years ago
- Subject changed from Comment back to the origin job according to the investigate job' result to Comment back to the origin job according to the investigate job's result
- Category set to Feature requests
- Target version set to future
- Parent task set to #94105
thanks. Let's collect a bit of experience first with reacting on failed jobs as done in #95746 before coming back here reacting on passed jobs.
Updated by okurz over 2 years ago
- Subject changed from Comment back to the origin job according to the investigate job's result to Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes
- Description updated (diff)
- Target version changed from future to Ready
Updated by okurz over 2 years ago
- Copied to action #110518: Call job_done_hooks if requested by test setting (not only openQA config as done so far) size:M added
Updated by mkittler over 2 years ago
- Status changed from Blocked to New
- Assignee deleted (
okurz)
Updated by okurz over 2 years ago
With #110518 resolved we can now trigger investigation jobs within openqa-investigate with the new test setting to trigger hooks on itself so that we can evaluate the result and comment accordingly on the original job.
Updated by livdywan over 2 years ago
- Subject changed from Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes to Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by tinita almost 2 years ago
- Status changed from Workable to In Progress
- Assignee set to tinita
Updated by tinita almost 2 years ago
Step 1: https://github.com/os-autoinst/scripts/pull/191 - investigate: Skip passed tests
TODO:
- Step 2: add _TRIGGER_JOB_DONE_HOOK=1 for investigate jobs
- Step 3: Post comment for passed investigate jobs
Updated by openqa_review almost 2 years ago
- Due date set to 2023-01-05
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz almost 2 years ago
- Due date changed from 2023-01-05 to 2023-01-20
christmas grace due date bump :)
Updated by livdywan almost 2 years ago
Updated by tinita almost 2 years ago
I have been busy with other things, e.g. setting up new laptop
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/200 - Make hook script easier to test
Updated by tinita almost 2 years ago
Unrelated, but I stumbled over the reused variable name: https://github.com/os-autoinst/scripts/pull/202 - Reuse job_data in openqa-investigate
Updated by tinita almost 2 years ago
https://github.com/os-autoinst/scripts/pull/200 - Make hook script easier to test (merged)
https://github.com/os-autoinst/scripts/pull/201 - Make wrapper scripts for batch processing (merged)
https://github.com/os-autoinst/scripts/pull/202 - Reuse job_data in openqa-investigate (merged)
https://github.com/os-autoinst/scripts/pull/203 - Fix unbound variable error (fix for 203) (merged)
Currently monitoring gru logs...
Updated by tinita almost 2 years ago
- Due date changed from 2023-01-20 to 2023-01-27
I still need a little bit of time for the actual feature, hoping that it is now a bit easier to test.
Updated by tinita almost 2 years ago
Updated by tinita almost 2 years ago
Fix regression: https://github.com/os-autoinst/scripts/pull/205
Updated by tinita almost 2 years ago
and another fix: https://github.com/os-autoinst/scripts/pull/206
Updated by tinita almost 2 years ago
Ready: https://github.com/os-autoinst/scripts/pull/204 - Post comment about sporadic failure
Updated by okurz almost 2 years ago
- Due date deleted (
2023-01-27) - Status changed from In Progress to Resolved
https://github.com/os-autoinst/scripts/pull/204 is merged. Verification in production in https://openqa.opensuse.org/tests/3059836#comments looks good. I assume we will find out in the parent ticket or related tickets in case the hook script does not act as expected. As we did not change the configuration server specific for that I assume we can resolve this right away now to also keep well within the second-time extended due-date.
Updated by okurz almost 2 years ago
- Due date set to 2023-01-27
- Status changed from Resolved to In Progress
We still have to enable the triggering of jobs
Updated by tinita almost 2 years ago
- Status changed from In Progress to Feedback
Now https://github.com/os-autoinst/scripts/pull/193 (Set _TRIGGER_JOB_DONE_HOOK=1 for investigation jobs) is ready for review
Updated by tinita almost 2 years ago
Updated by tinita almost 2 years ago
I checked passed retry jobs and it seems like the hook is not working:
https://openqa.opensuse.org/tests/3068720#settings
This has the correct setting _TRIGGER_JOB_DONE_HOOK
but apparently it didn't trigger the hook. Will check minion logs/db if there is more information.
Updated by tinita almost 2 years ago
I created a job myself with the setting, and it did't appear in the minion dashboard at all:
https://openqa.opensuse.org/tests/3069790
So the logic that runs the hook in lib/OpenQA/Task/Job/FinalizeResults.pm
might be broken. Will verify locally what's happening.
Updated by tinita almost 2 years ago
Ok, I needed to configure job_done_hook
in openqa.ini
also. I did that now on o3 and restarted gru.
Let's see what happens.
Updated by tinita almost 2 years ago
Ok, it worked on o3:
https://openqa.opensuse.org/tests/3070473#comment-382171
https://openqa.opensuse.org/tests/3070359#comment-382131
So I enabled it on osd too.
Updated by okurz almost 2 years ago
So I enabled it on osd too.
You must do that with salt though to make permanent
Updated by tinita almost 2 years ago
okurz wrote:
You must do that with salt though to make permanent
oops.
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/789
Updated by tinita almost 2 years ago
Also worked on osd: https://openqa.suse.de/tests/10385488#comment-732429
Only waiting for https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/789 to be merged
Updated by tinita almost 2 years ago
- Related to action #124274: openQA reports non-sporadic issue when retry job just softfailed size:M added