action #124274
closedopenQA reports non-sporadic issue when retry job just softfailed size:M
0%
Description
Motivation¶
I got the feedback that the retry job failed for osd#10460004.
Investigate retry job: https://openqa.suse.de/t10461879 failed, likely not a sporadic failure
However, the retry job just softfailed and the affected step previously failing is entirely green:
osd#10461879
Acceptance Criteria¶
- AC1: softfailed not handled as failure in this case
Suggestions¶
- It could be a one line change: https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L191
Updated by okurz almost 2 years ago
- Tags set to reactive work
- Target version set to Ready
Updated by mkittler almost 2 years ago
We have job_done_hook = env host=openqa.suse.de exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization).*' grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook
configured so the hook script runs regardless of the job's result. I'm wondering where we take care not to run into the "Investigate retry job: … failed" assumption for passed/softfailed jobs. Since we have no job_done_hook_enable_… = 1
settings the hook script is actually only running for failed
, incomplete
or timeout_exceeded
results.
Since the job has _TRIGGER_JOB_DONE_HOOK=1
the generic hook script is triggered for this particular job after all (regardless of the result). We apparently don't do any extra checks in openqa-label-known-issues-and-investigate-hook
to avoid running into the "Investigate retry job: … failed" assumption so this is what's happening. Supposedly we should have an extra check there. I'm not sure where the _TRIGGER_JOB_DONE_HOOK=1
job settings comes from and why it was added.
Updated by tinita almost 2 years ago
mkittler wrote:
I'm not sure where the
_TRIGGER_JOB_DONE_HOOK=1
job settings comes from and why it was added.
_TRIGGER_JOB_DONE_HOOK=1 was added by me as part of #98862 for investigate:retry
jobs.
We need to run the hook script in order to report when a retry job passed.
For that I also needed to enable job_done_hook
, and I guess this is now also called for softfailed. Should I rather configure job_done_hook_passed
instead?
Updated by tinita almost 2 years ago
- Related to action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M added
Updated by mkittler almost 2 years ago
- Assignee set to mkittler
For that I also needed to enable job_done_hook, and I guess this is now also called for softfailed. Should I rather configure job_done_hook_passed instead?
I don't think so. The "if openqa-investigate retry job passes" part in #98862 is likely also supposed to include softfails.
I suppose I will just add a check to skip writing this comments for passed/softfailed jobs.
Updated by mkittler almost 2 years ago
- Assignee deleted (
mkittler)
Or maybe let's estimate it first.
Updated by mkittler almost 2 years ago
It would be good to estimate this with @okurz to clarify whether we can really treat "softfailed" as "passed" here.
Updated by okurz almost 2 years ago
In general what users commonly expect is that the investigation jobs tell if the*same* issue happens again. We make the assumption that if a job fails again then likely it's the same issue even though that will not be generally true. IMHO that assumption is still fine for the sake of openqa-investigate. Regarding failed, softfailed I assume we only trigger openqa-investigate in the first place for failed jobs hence we want to know if retry jobs fail. So in my understanding all jobs with "ok-result" should be treated the same
Updated by okurz almost 2 years ago
mkittler wrote:
I suppose I will just add a check to skip writing this comments for passed/softfailed jobs.
As discussed in the weekly 2023-03-10 we clarified that we do have the feature to write a comment if a job passes so we should ensure that for any "ok" result we treat it the same. As soft-fail effectively means "known issue" then the reason for job failure can not be the same as the original "new, unreviewed issue".
Updated by livdywan over 1 year ago
- Subject changed from openQA reports non-sporadic issue when retry job just softfailed to openQA reports non-sporadic issue when retry job just softfailed size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by tinita over 1 year ago
- Status changed from Workable to In Progress
- Assignee set to tinita
Updated by tinita over 1 year ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/scripts/pull/221 Treat softfailed as passed in openqa-investigate
Updated by tinita over 1 year ago
- Status changed from Feedback to Resolved
The PR was merged, resolving. @clanig let us know if you see something unexpected again, thanks