action #132332
closedcoordination #102915: [saga][epic] Automated classification of failures
QA (public) - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Multiple investigation comments for multimachine tests size:M
Description
Observation¶
We see in https://openqa.suse.de/tests/11507412#comments that 8 several investigate:retry
jobs are commenting back to the original job.
Acceptance criteria¶
- AC1: There is only one comment based on the result of the "retry" job reporting back the investigation results even if multiple jobs have been cloned (like for parallel clusters)
Suggestions¶
- We don't need to display any other job from an investigation cluster than one corresponding to the original scenario. For example for a salt-master+salt-minion job where the salt-minion job fails we are only interested in results of investigation jobs for the salt-minion testsuite
- Find out in the retry job if we are the actual job we're interested in. E.g. we have
salt-master:investigate:retry
andsalt-minion:investigate:retry
. Look into the OPENQA_INVESTIGATE_ORIGIN job data and look at theTEST
name and compare it to the current TEST, e.g. if "salt-minion" matches the beginning of the string "salt-minion:investigate:retry" then it's the same test, "salt-master" is not the same as "salt-minion:investigate:retry". If it doesn't match, just return from the post-investigate function.
Out of scope¶
- Don't care if job links for related jobs of a cluster show up or not, e.g. if the "salt-minion" job failed, it does not matter if a parallel "salt-master" job URL shows up in the original investigate comment or not
Further details¶
- See #95783 and https://github.com/os-autoinst/scripts/pull/170 for background
Updated by tinita over 1 year ago
- Copied from action #109920: Identify reproducible product issues using openqa-investigate size:M added
Updated by tinita over 1 year ago
- Status changed from New to In Progress
- Assignee set to tinita
Updated by tinita over 1 year ago
The actual fix is not that hard I think:
post-investigate() {
- local id=$1 old_name=$2
+ local id=$1 retry_name=$2
local rc=0 status
- [[ ! "$old_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
+ [[ ! "$retry_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
# We are in the investigate:retry job now. From here we will check the
# results of the other investigation jobs, if necessary
retry_result="$(echo "$job_data" | runjq -r '.job.result')" || return $?
investigate_origin="$(echo "$job_data" | runjq -r '.job.settings.OPENQA_INVESTIGATE_ORIGIN')" || return 1
origin_job_id=${investigate_origin#"$host_url/t"}
+ origin_job_data=$(openqa-cli "${client_args[@]}" --json jobs/"$origin_job_id") || return $?
+ origin_name="$(echo "$origin_job_data" | runjq -r '.job.test')" || return $?
+ # cluster jobs might have the same OPENQA_INVESTIGATE_ORIGIN as the root retry job
+ [[ $retry_name != "$origin_name:investigate:retry" ]] && echo "Job $retry_name ($id) is not a retry of $origin_name ($origin_job_id), skipping investigation" && return 0
But now it's time to cleanup the tests a bit, because the mocking data is not well structured anymore.
Updated by okurz over 1 year ago
- Subject changed from Multiple investigation comments for multimachine tests to Multiple investigation comments for multimachine tests size:M
- Description updated (diff)
Updated by tinita over 1 year ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/scripts/pull/246 Consider only direct retry job for investigation
Updated by tinita over 1 year ago
Merged and deployed. Will check later if there are still multiple comments.
A bit hard to find out because we also get multiple comments for jobs which have a RETRY
set:
https://openqa.suse.de/tests/11530247#comments
I wonder if investigate:retry
jobs should actually have the RETRY
setting removed?
Updated by okurz over 1 year ago
tinita wrote:
Merged and deployed. Will check later if there are still multiple comments.
A bit hard to find out because we also get multiple comments for jobs which have a
RETRY
set:
https://openqa.suse.de/tests/11530247#commentsI wonder if
investigate:retry
jobs should actually have theRETRY
setting removed?
Don't care about the RETRY flag or whatever else retriggered the same test over and over again. We should only prevent the comments from sibling in a cluster.
Updated by okurz over 1 year ago
- Due date changed from 2023-07-08 to 2023-07-21
Updating the due-date which was originally from the clonee ticket.
Updated by tinita over 1 year ago
Another small PR: https://github.com/os-autoinst/scripts/pull/247 Add test name to retry job comment (merged)
Updated by okurz over 1 year ago
Looks good. Can we resolve or is there something else you wait for, like for validation?
Updated by tinita over 1 year ago
Yes, I would like to validate and haven't come up with a good SQL statement yet
Updated by tinita over 1 year ago
- Status changed from Feedback to Resolved
Ok, let's assume this works. I will be away for one week, so no time to check right now. Also I actually trust the unit tests.