action #132332
closedcoordination #102915: [saga][epic] Automated classification of failures
QA - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Multiple investigation comments for multimachine tests size:M
Description
Observation¶
We see in https://openqa.suse.de/tests/11507412#comments that 8 several investigate:retry
jobs are commenting back to the original job.
Acceptance criteria¶
- AC1: There is only one comment based on the result of the "retry" job reporting back the investigation results even if multiple jobs have been cloned (like for parallel clusters)
Suggestions¶
- We don't need to display any other job from an investigation cluster than one corresponding to the original scenario. For example for a salt-master+salt-minion job where the salt-minion job fails we are only interested in results of investigation jobs for the salt-minion testsuite
- Find out in the retry job if we are the actual job we're interested in. E.g. we have
salt-master:investigate:retry
andsalt-minion:investigate:retry
. Look into the OPENQA_INVESTIGATE_ORIGIN job data and look at theTEST
name and compare it to the current TEST, e.g. if "salt-minion" matches the beginning of the string "salt-minion:investigate:retry" then it's the same test, "salt-master" is not the same as "salt-minion:investigate:retry". If it doesn't match, just return from the post-investigate function.
Out of scope¶
- Don't care if job links for related jobs of a cluster show up or not, e.g. if the "salt-minion" job failed, it does not matter if a parallel "salt-master" job URL shows up in the original investigate comment or not
Further details¶
- See #95783 and https://github.com/os-autoinst/scripts/pull/170 for background
Updated by tinita 10 months ago
- Copied from action #109920: Identify reproducible product issues using openqa-investigate size:M added
Updated by tinita 10 months ago
The actual fix is not that hard I think:
post-investigate() {
- local id=$1 old_name=$2
+ local id=$1 retry_name=$2
local rc=0 status
- [[ ! "$old_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
+ [[ ! "$retry_name" =~ investigate:retry$ ]] && echo "Job is ':investigate:' already, skipping investigation" && return 0
# We are in the investigate:retry job now. From here we will check the
# results of the other investigation jobs, if necessary
retry_result="$(echo "$job_data" | runjq -r '.job.result')" || return $?
investigate_origin="$(echo "$job_data" | runjq -r '.job.settings.OPENQA_INVESTIGATE_ORIGIN')" || return 1
origin_job_id=${investigate_origin#"$host_url/t"}
+ origin_job_data=$(openqa-cli "${client_args[@]}" --json jobs/"$origin_job_id") || return $?
+ origin_name="$(echo "$origin_job_data" | runjq -r '.job.test')" || return $?
+ # cluster jobs might have the same OPENQA_INVESTIGATE_ORIGIN as the root retry job
+ [[ $retry_name != "$origin_name:investigate:retry" ]] && echo "Job $retry_name ($id) is not a retry of $origin_name ($origin_job_id), skipping investigation" && return 0
But now it's time to cleanup the tests a bit, because the mocking data is not well structured anymore.
Updated by tinita 10 months ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/scripts/pull/246 Consider only direct retry job for investigation
Updated by tinita 10 months ago
Merged and deployed. Will check later if there are still multiple comments.
A bit hard to find out because we also get multiple comments for jobs which have a RETRY
set:
https://openqa.suse.de/tests/11530247#comments
I wonder if investigate:retry
jobs should actually have the RETRY
setting removed?
Updated by okurz 10 months ago
tinita wrote:
Merged and deployed. Will check later if there are still multiple comments.
A bit hard to find out because we also get multiple comments for jobs which have a
RETRY
set:
https://openqa.suse.de/tests/11530247#commentsI wonder if
investigate:retry
jobs should actually have theRETRY
setting removed?
Don't care about the RETRY flag or whatever else retriggered the same test over and over again. We should only prevent the comments from sibling in a cluster.
Updated by tinita 10 months ago
Another small PR: https://github.com/os-autoinst/scripts/pull/247 Add test name to retry job comment (merged)