Project

General

Profile

Actions

action #98862

closed

coordination #102915: [saga][epic] Automated classification of failures

QA - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M

Added by Xiaojing_liu over 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-09-18
Due date:
% Done:

0%

Estimated time:

Description

Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes

User story

When a job fails, we will investigate the job by triggering a new job with different conditions, such as re-run, re-run with the last good build, or last commit in the test distribution.
If a retry job passes, the original job may be a intermittent/sporadic case.
We should provide that assessment as a comment back on the original job.

Acceptance criteria

  • AC1: If an openqa-investigate retry job passes, a comment is created on the original job with the assessment that the issue is intermittent or a sporadic test issue

Suggestions

  • Call the hook script with a setting to run the hook on passed jobs, e.g. _TRIGGER_JOB_DONE_HOOK=0 or =1
  • Consider exiting early from openqa-label-known-issues to ensure it does not handle passed jobs
  • Take a look at how we identify likely sporadic issues as a result of failed "retry" jobs in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L136=
  • Extend to react according on passed retry
  • Ensure that hook scripts in github.com/os-autoinst/scripts/ act as one should expect when called on passed jobs
  • Alternative: Extend o3 config to also run job_done_hooks on passed jobs and monitor performance impact
  • Same on osd as on o3

Related issues 3 (0 open3 closed)

Related to QA - action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:MResolvedXiaojing_liu2021-07-20

Actions
Related to openQA Project - action #124274: openQA reports non-sporadic issue when retry job just softfailed size:MResolvedtinita2023-02-10

Actions
Copied to openQA Project - action #110518: Call job_done_hooks if requested by test setting (not only openQA config as done so far) size:MResolvedmkittler2021-09-18

Actions
Actions #1

Updated by Xiaojing_liu over 2 years ago

  • Related to action #95746: Identify likely "sporadic" openQA tests with "openqa-investigate" size:M added
Actions #2

Updated by okurz over 2 years ago

  • Subject changed from Comment back to the origin job according to the investigate job' result to Comment back to the origin job according to the investigate job's result
  • Category set to Feature requests
  • Target version set to future
  • Parent task set to #94105

thanks. Let's collect a bit of experience first with reacting on failed jobs as done in #95746 before coming back here reacting on passed jobs.

Actions #3

Updated by okurz almost 2 years ago

  • Subject changed from Comment back to the origin job according to the investigate job's result to Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes
  • Description updated (diff)
  • Target version changed from future to Ready
Actions #4

Updated by okurz almost 2 years ago

  • Copied to action #110518: Call job_done_hooks if requested by test setting (not only openQA config as done so far) size:M added
Actions #5

Updated by okurz almost 2 years ago

  • Description updated (diff)
  • Status changed from New to Blocked
  • Assignee set to okurz

blocked by #110518

Actions #6

Updated by mkittler almost 2 years ago

  • Status changed from Blocked to New
  • Assignee deleted (okurz)
Actions #7

Updated by okurz almost 2 years ago

With #110518 resolved we can now trigger investigation jobs within openqa-investigate with the new test setting to trigger hooks on itself so that we can evaluate the result and comment accordingly on the original job.

Actions #8

Updated by okurz almost 2 years ago

  • Target version changed from Ready to future
Actions #9

Updated by okurz over 1 year ago

  • Target version changed from future to Ready
Actions #11

Updated by livdywan over 1 year ago

  • Subject changed from Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes to Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #12

Updated by okurz over 1 year ago

  • Priority changed from Low to High
Actions #13

Updated by tinita about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita
Actions #14

Updated by tinita about 1 year ago

Step 1: https://github.com/os-autoinst/scripts/pull/191 - investigate: Skip passed tests

TODO:

  • Step 2: add _TRIGGER_JOB_DONE_HOOK=1 for investigate jobs
  • Step 3: Post comment for passed investigate jobs
Actions #15

Updated by openqa_review about 1 year ago

  • Due date set to 2023-01-05

Setting due date based on mean cycle time of SUSE QE Tools

Actions #16

Updated by okurz about 1 year ago

  • Due date changed from 2023-01-05 to 2023-01-20

christmas grace due date bump :)

Actions #18

Updated by tinita about 1 year ago

I have been busy with other things, e.g. setting up new laptop

Actions #19

Updated by tinita about 1 year ago

https://github.com/os-autoinst/scripts/pull/200 - Make hook script easier to test

Actions #20

Updated by tinita about 1 year ago

Unrelated, but I stumbled over the reused variable name: https://github.com/os-autoinst/scripts/pull/202 - Reuse job_data in openqa-investigate

Actions #21

Updated by tinita about 1 year ago

https://github.com/os-autoinst/scripts/pull/200 - Make hook script easier to test (merged)
https://github.com/os-autoinst/scripts/pull/201 - Make wrapper scripts for batch processing (merged)
https://github.com/os-autoinst/scripts/pull/202 - Reuse job_data in openqa-investigate (merged)
https://github.com/os-autoinst/scripts/pull/203 - Fix unbound variable error (fix for 203) (merged)

Currently monitoring gru logs...

Actions #22

Updated by tinita about 1 year ago

  • Due date changed from 2023-01-20 to 2023-01-27

I still need a little bit of time for the actual feature, hoping that it is now a bit easier to test.

Actions #26

Updated by tinita about 1 year ago

Ready: https://github.com/os-autoinst/scripts/pull/204 - Post comment about sporadic failure

Actions #27

Updated by okurz about 1 year ago

  • Due date deleted (2023-01-27)
  • Status changed from In Progress to Resolved

https://github.com/os-autoinst/scripts/pull/204 is merged. Verification in production in https://openqa.opensuse.org/tests/3059836#comments looks good. I assume we will find out in the parent ticket or related tickets in case the hook script does not act as expected. As we did not change the configuration server specific for that I assume we can resolve this right away now to also keep well within the second-time extended due-date.

Actions #28

Updated by okurz about 1 year ago

  • Due date set to 2023-01-27
  • Status changed from Resolved to In Progress

We still have to enable the triggering of jobs

Actions #29

Updated by tinita about 1 year ago

  • Status changed from In Progress to Feedback

Now https://github.com/os-autoinst/scripts/pull/193 (Set _TRIGGER_JOB_DONE_HOOK=1 for investigation jobs) is ready for review

Actions #31

Updated by tinita about 1 year ago

I checked passed retry jobs and it seems like the hook is not working:
https://openqa.opensuse.org/tests/3068720#settings
This has the correct setting _TRIGGER_JOB_DONE_HOOK but apparently it didn't trigger the hook. Will check minion logs/db if there is more information.

Actions #32

Updated by tinita about 1 year ago

I created a job myself with the setting, and it did't appear in the minion dashboard at all:
https://openqa.opensuse.org/tests/3069790
So the logic that runs the hook in lib/OpenQA/Task/Job/FinalizeResults.pm might be broken. Will verify locally what's happening.

Actions #33

Updated by tinita about 1 year ago

Ok, I needed to configure job_done_hook in openqa.ini also. I did that now on o3 and restarted gru.
Let's see what happens.

Actions #35

Updated by okurz about 1 year ago

So I enabled it on osd too.

You must do that with salt though to make permanent

Actions #36

Updated by tinita about 1 year ago

okurz wrote:

You must do that with salt though to make permanent

oops.
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/789

Actions #38

Updated by tinita about 1 year ago

  • Status changed from Feedback to Resolved

Merged

Actions #39

Updated by okurz about 1 year ago

  • Due date deleted (2023-01-27)
Actions #40

Updated by tinita about 1 year ago

  • Related to action #124274: openQA reports non-sporadic issue when retry job just softfailed size:M added
Actions

Also available in: Atom PDF