Project

General

Profile

Actions

action #132332

closed

coordination #102915: [saga][epic] Automated classification of failures

QA - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests

Multiple investigation comments for multimachine tests size:M

Added by tinita over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-03-23
Due date:
% Done:

100%

Estimated time:

Description

Observation

We see in https://openqa.suse.de/tests/11507412#comments that 8 several investigate:retry jobs are commenting back to the original job.

Acceptance criteria

  • AC1: There is only one comment based on the result of the "retry" job reporting back the investigation results even if multiple jobs have been cloned (like for parallel clusters)

Suggestions

  • We don't need to display any other job from an investigation cluster than one corresponding to the original scenario. For example for a salt-master+salt-minion job where the salt-minion job fails we are only interested in results of investigation jobs for the salt-minion testsuite
  • Find out in the retry job if we are the actual job we're interested in. E.g. we have salt-master:investigate:retry and salt-minion:investigate:retry. Look into the OPENQA_INVESTIGATE_ORIGIN job data and look at the TEST name and compare it to the current TEST, e.g. if "salt-minion" matches the beginning of the string "salt-minion:investigate:retry" then it's the same test, "salt-master" is not the same as "salt-minion:investigate:retry". If it doesn't match, just return from the post-investigate function.

Out of scope

  • Don't care if job links for related jobs of a cluster show up or not, e.g. if the "salt-minion" job failed, it does not matter if a parallel "salt-master" job URL shows up in the original investigate comment or not

Further details


Related issues 1 (0 open1 closed)

Copied from openQA Project - action #109920: Identify reproducible product issues using openqa-investigate size:MResolvedtinita2023-03-23

Actions
Actions #1

Updated by tinita over 1 year ago

  • Copied from action #109920: Identify reproducible product issues using openqa-investigate size:M added
Actions #2

Updated by tinita over 1 year ago

  • Description updated (diff)
Actions #3

Updated by tinita over 1 year ago

  • Status changed from New to In Progress
  • Assignee set to tinita
Actions #4

Updated by tinita over 1 year ago

The actual fix is not that hard I think:

 post-investigate() {
-    local id=$1 old_name=$2
+    local id=$1 retry_name=$2
     local rc=0 status
-    [[ ! "$old_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
+    [[ ! "$retry_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
     # We are in the investigate:retry job now. From here we will check the
     # results of the other investigation jobs, if necessary
     retry_result="$(echo "$job_data" | runjq -r '.job.result')" || return $?
     investigate_origin="$(echo "$job_data" | runjq -r '.job.settings.OPENQA_INVESTIGATE_ORIGIN')" || return 1
     origin_job_id=${investigate_origin#"$host_url/t"}
+    origin_job_data=$(openqa-cli "${client_args[@]}" --json jobs/"$origin_job_id") || return $?
+    origin_name="$(echo "$origin_job_data" | runjq -r '.job.test')" || return $?
+    # cluster jobs might have the same OPENQA_INVESTIGATE_ORIGIN as the root retry job
+    [[ $retry_name != "$origin_name:investigate:retry" ]] && echo "Job $retry_name ($id) is not a retry of $origin_name ($origin_job_id), skipping investigation" && return 0

But now it's time to cleanup the tests a bit, because the mocking data is not well structured anymore.

Actions #5

Updated by okurz over 1 year ago

  • Subject changed from Multiple investigation comments for multimachine tests to Multiple investigation comments for multimachine tests size:M
  • Description updated (diff)
Actions #6

Updated by tinita over 1 year ago

  • Status changed from In Progress to Feedback

https://github.com/os-autoinst/scripts/pull/246 Consider only direct retry job for investigation

Actions #7

Updated by tinita over 1 year ago

Merged and deployed. Will check later if there are still multiple comments.

A bit hard to find out because we also get multiple comments for jobs which have a RETRY set:
https://openqa.suse.de/tests/11530247#comments

I wonder if investigate:retry jobs should actually have the RETRY setting removed?

Actions #8

Updated by okurz over 1 year ago

tinita wrote:

Merged and deployed. Will check later if there are still multiple comments.

A bit hard to find out because we also get multiple comments for jobs which have a RETRY set:
https://openqa.suse.de/tests/11530247#comments

I wonder if investigate:retry jobs should actually have the RETRY setting removed?

Don't care about the RETRY flag or whatever else retriggered the same test over and over again. We should only prevent the comments from sibling in a cluster.

Actions #9

Updated by okurz over 1 year ago

  • Due date changed from 2023-07-08 to 2023-07-21

Updating the due-date which was originally from the clonee ticket.

Actions #10

Updated by tinita over 1 year ago

Another small PR: https://github.com/os-autoinst/scripts/pull/247 Add test name to retry job comment (merged)

Actions #11

Updated by okurz over 1 year ago

Looks good. Can we resolve or is there something else you wait for, like for validation?

Actions #12

Updated by tinita over 1 year ago

Yes, I would like to validate and haven't come up with a good SQL statement yet

Actions #13

Updated by tinita over 1 year ago

  • Status changed from Feedback to Resolved

Ok, let's assume this works. I will be away for one week, so no time to check right now. Also I actually trust the unit tests.

Actions #14

Updated by okurz about 1 year ago

  • Due date deleted (2023-07-21)
Actions

Also available in: Atom PDF