action #132332: Multiple investigation comments for multimachine tests size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

action #132332

closed

coordination #102915: [saga][epic] Automated classification of failures

QA (public) - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

Multiple investigation comments for multimachine tests size:M

Added by tinita almost 2 years ago. Updated almost 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

tinita

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2023-03-23

Due date:

% Done:

100%

Estimated time:

Description

Observation¶

We see in https://openqa.suse.de/tests/11507412#comments that 8 several investigate:retry jobs are commenting back to the original job.

Acceptance criteria¶

AC1: There is only one comment based on the result of the "retry" job reporting back the investigation results even if multiple jobs have been cloned (like for parallel clusters)

Suggestions¶

We don't need to display any other job from an investigation cluster than one corresponding to the original scenario. For example for a salt-master+salt-minion job where the salt-minion job fails we are only interested in results of investigation jobs for the salt-minion testsuite
Find out in the retry job if we are the actual job we're interested in. E.g. we have salt-master:investigate:retry and salt-minion:investigate:retry. Look into the OPENQA_INVESTIGATE_ORIGIN job data and look at the TEST name and compare it to the current TEST, e.g. if "salt-minion" matches the beginning of the string "salt-minion:investigate:retry" then it's the same test, "salt-master" is not the same as "salt-minion:investigate:retry". If it doesn't match, just return from the post-investigate function.

Out of scope¶

Don't care if job links for related jobs of a cluster show up or not, e.g. if the "salt-minion" job failed, it does not matter if a parallel "salt-master" job URL shows up in the original investigate comment or not

Further details¶

See #95783 and https://github.com/os-autoinst/scripts/pull/170 for background

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by tinita almost 2 years ago

Copied from action #109920: Identify reproducible product issues using openqa-investigate size:M added

Actions

Copy link

Updated by tinita almost 2 years ago

Description updated (diff)

Actions

Copy link

Updated by tinita almost 2 years ago

Status changed from New to In Progress
Assignee set to tinita

Actions

Copy link

Updated by tinita almost 2 years ago

The actual fix is not that hard I think:

 post-investigate() {
-    local id=$1 old_name=$2
+    local id=$1 retry_name=$2
     local rc=0 status
-    [[ ! "$old_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
+    [[ ! "$retry_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
     # We are in the investigate:retry job now. From here we will check the
     # results of the other investigation jobs, if necessary
     retry_result="$(echo "$job_data" | runjq -r '.job.result')" || return $?
     investigate_origin="$(echo "$job_data" | runjq -r '.job.settings.OPENQA_INVESTIGATE_ORIGIN')" || return 1
     origin_job_id=${investigate_origin#"$host_url/t"}
+    origin_job_data=$(openqa-cli "${client_args[@]}" --json jobs/"$origin_job_id") || return $?
+    origin_name="$(echo "$origin_job_data" | runjq -r '.job.test')" || return $?
+    # cluster jobs might have the same OPENQA_INVESTIGATE_ORIGIN as the root retry job
+    [[ $retry_name != "$origin_name:investigate:retry" ]] && echo "Job $retry_name ($id) is not a retry of $origin_name ($origin_job_id), skipping investigation" && return 0

But now it's time to cleanup the tests a bit, because the mocking data is not well structured anymore.

Actions

Copy link

Updated by okurz almost 2 years ago

Subject changed from Multiple investigation comments for multimachine tests to Multiple investigation comments for multimachine tests size:M
Description updated (diff)

Actions

Copy link

Updated by tinita almost 2 years ago

Status changed from In Progress to Feedback

https://github.com/os-autoinst/scripts/pull/246 Consider only direct retry job for investigation

Actions

Copy link

Updated by tinita almost 2 years ago

Merged and deployed. Will check later if there are still multiple comments.

A bit hard to find out because we also get multiple comments for jobs which have a RETRY set:
https://openqa.suse.de/tests/11530247#comments

I wonder if investigate:retry jobs should actually have the RETRY setting removed?

Actions

Copy link

Updated by okurz almost 2 years ago

tinita wrote:

Merged and deployed. Will check later if there are still multiple comments.

A bit hard to find out because we also get multiple comments for jobs which have a RETRY set:
https://openqa.suse.de/tests/11530247#comments

I wonder if investigate:retry jobs should actually have the RETRY setting removed?

Don't care about the RETRY flag or whatever else retriggered the same test over and over again. We should only prevent the comments from sibling in a cluster.

Actions

Copy link