action #124274: openQA reports non-sporadic issue when retry job just softfailed size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #124274

closed

openQA reports non-sporadic issue when retry job just softfailed size:M

Added by clanig about 2 years ago. Updated about 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

tinita

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2023-02-10

Due date:

% Done:

Estimated time:

Tags:

reactive work

Description

Motivation¶

I got the feedback that the retry job failed for osd#10460004.

Investigate retry job: https://openqa.suse.de/t10461879 failed, likely not a sporadic failure

However, the retry job just softfailed and the affected step previously failing is entirely green:
osd#10461879

Acceptance Criteria¶

AC1: softfailed not handled as failure in this case

Suggestions¶

It could be a one line change: https://github.com/os-autoinst/scripts/blob/master/openqa-investigate#L191

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz about 2 years ago

Tags set to reactive work
Target version set to Ready

Actions

Copy link

Updated by mkittler about 2 years ago

We have job_done_hook = env host=openqa.suse.de exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization).*' grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook configured ~~so the hook script runs regardless of the job's result. I'm wondering where we take care not to run into the "Investigate retry job: … failed" assumption for passed/softfailed jobs.~~ ~~Since we have no job_done_hook_enable_… = 1 settings the hook script is actually only running for failed, incomplete or timeout_exceeded results.~~

Since the job has _TRIGGER_JOB_DONE_HOOK=1 the generic hook script is triggered for this particular job after all (regardless of the result). We apparently don't do any extra checks in openqa-label-known-issues-and-investigate-hook to avoid running into the "Investigate retry job: … failed" assumption so this is what's happening. Supposedly we should have an extra check there. I'm not sure where the _TRIGGER_JOB_DONE_HOOK=1 job settings comes from and why it was added.

Actions

Copy link

Updated by tinita about 2 years ago

mkittler wrote:

I'm not sure where the _TRIGGER_JOB_DONE_HOOK=1 job settings comes from and why it was added.

_TRIGGER_JOB_DONE_HOOK=1 was added by me as part of #98862 for investigate:retry jobs.

We need to run the hook script in order to report when a retry job passed.
For that I also needed to enable job_done_hook, and I guess this is now also called for softfailed. Should I rather configure job_done_hook_passed instead?

Actions

Copy link

Updated by tinita about 2 years ago

Related to action #98862: Comment about intermittent/sporadic test issues on original job if openqa-investigate retry job passes size:M added

Actions

Copy link

Updated by mkittler about 2 years ago

Assignee set to mkittler

For that I also needed to enable job_done_hook, and I guess this is now also called for softfailed. Should I rather configure job_done_hook_passed instead?

I don't think so. The "if openqa-investigate retry job passes" part in #98862 is likely also supposed to include softfails.

I suppose I will just add a check to skip writing this comments for passed/softfailed jobs.

Actions

Copy link

Updated by mkittler about 2 years ago

Assignee deleted (~~mkittler~~)

Or maybe let's estimate it first.

Actions

Copy link

Updated by mkittler about 2 years ago

It would be good to estimate this with @okurz to clarify whether we can really treat "softfailed" as "passed" here.

Actions

Copy link

Updated by okurz about 2 years ago

In general what users commonly expect is that the investigation jobs tell if the*same* issue happens again. We make the assumption that if a job fails again then likely it's the same issue even though that will not be generally true. IMHO that assumption is still fine for the sake of openqa-investigate. Regarding failed, softfailed I assume we only trigger openqa-investigate in the first place for failed jobs hence we want to know if retry jobs fail. So in my understanding all jobs with "ok-result" should be treated the same

Actions

Copy link

Updated by okurz about 2 years ago

mkittler wrote:

I suppose I will just add a check to skip writing this comments for passed/softfailed jobs.

As discussed in the weekly 2023-03-10 we clarified that we do have the feature to write a comment if a job passes so we should ensure that for any "ok" result we treat it the same. As soft-fail effectively means "known issue" then the reason for job failure can not be the same as the original "new, unreviewed issue".

Actions

Copy link

#10

Updated by livdywan about 2 years ago

Subject changed from openQA reports non-sporadic issue when retry job just softfailed to openQA reports non-sporadic issue when retry job just softfailed size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

#11

Updated by tinita about 2 years ago

Status changed from Workable to In Progress
Assignee set to tinita

Actions

Copy link

#12

Updated by tinita about 2 years ago

Status changed from In Progress to Feedback

https://github.com/os-autoinst/scripts/pull/221 Treat softfailed as passed in openqa-investigate

Actions

Copy link

#13

Updated by tinita about 2 years ago

Status changed from Feedback to Resolved

The PR was merged, resolving. @clanig let us know if you see something unexpected again, thanks

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #124274

openQA reports non-sporadic issue when retry job just softfailed size:M

Motivation¶

Acceptance Criteria¶

Suggestions¶

Updated by okurz about 2 years ago

Updated by mkittler about 2 years ago

Updated by tinita about 2 years ago

Updated by tinita about 2 years ago

Updated by mkittler about 2 years ago

Updated by mkittler about 2 years ago

Updated by mkittler about 2 years ago

Updated by okurz about 2 years ago

Updated by okurz about 2 years ago

Updated by livdywan about 2 years ago

Updated by tinita about 2 years ago

Updated by tinita about 2 years ago

Updated by tinita about 2 years ago