action #154900
openStatistic frequency for the support images issues
0%
Description
Motivation¶
After removal of the copied support images in OSD hdd/fixed/, we'd better to statistic the frequency for the support images issues, such as incomplete, published but cleaned up, and other issues. We can gather/list the problem we find in comments, then evaluate the frequency and decide how to fix these issues.
Acceptance criteria¶
AC1: Statistic frequency for the support images issues.
Updated by syrianidou_sofia about 2 months ago
s390x published support images deleted 9 hours after publishing
Updated by lmanfredi about 2 months ago
A small automation (bash script) for gathering all missing qcow2
by searching inside incomplete jobs for all ours groups:
#!/usr/bin/env bash
set -e
function GET_RESULT_INCOMPLETE_ALL_GROUPS() {
declare -a groups=(
265 # SLE15 / Migration
535 # YaST & MMU
510 # Migration
129 # SLE15 / YaST
421 # YaST MU
478 # Migration Misc.
266 # Migration Milestone
520 # Yam Support Image
)
local params=''
for group in "${groups[@]}"; do
params="$params&groupid=$group"
done
local API_URL="https://openqa.suse.de/api/v1/jobs/overview?result=incomplete$params"
declare -a ids=(
$(curl -k -X GET "$API_URL" 2>/dev/null | jq '.[].id')
)
declare -A dict_qcow=()
for id in "${ids[@]}"; do
qcow=$(curl -k -X GET "https://openqa.suse.de/api/v1/jobs/${id}/details" 2>/dev/null | jq -r '.job.reason | [scan("\\S+.qcow2")] | first')
# qcow=$(curl -k -X GET "https://openqa.suse.de/api/v1/jobs/${id}/details" 2>/dev/null | jq -r '.job.reason' | perl -ne 'print "$1" if /Failed to download (\S+)/')
dict_qcow["$qcow"]=1
done
local IFS=$'\n'
s_qcow=($(sort <<<"${!dict_qcow[*]}"))
for qcow in "${s_qcow[@]}"; do
echo "- ${qcow}"
done
}
GET_RESULT_INCOMPLETE_ALL_GROUPS
Updated by syrianidou_sofia about 2 months ago
We could make a script to make a list with all PUBLISH_HDD from successful jobs in our job groups and then check the /var/lib/openqa/factory/hdd/ from osd to see which ones are deleted
Updated by JERiveraMoya about 2 months ago
syrianidou_sofia wrote in #note-2:
s390x published support images deleted 9 hours after publishing
the only root cause we found was that maintenance was triggered several times since the day before.
Updated by JERiveraMoya about 2 months ago
syrianidou_sofia wrote in #note-4:
We could make a script to make a list with all PUBLISH_HDD from successful jobs in our job groups and then check the /var/lib/openqa/factory/hdd/ from osd to see which ones are deleted
the goal of this ticket was keep track of the problem and identify root causes, but I see that you are trying to go one step forward :) with automation to detect when this happens. I believe this problem happens in really rear circumstances to invest much time on this, but if you find it interesting, we should investigate WHERE to put those scripts, because I can guess that tools team already have some solution for that to monitor jobs and assets.
Updated by tinawang123 about 1 month ago
Missed qcow image: /var/lib/openqa/cache/openqa.suse.de/autoyast_SLEHPC-15-SP3-aarch64-DEV-gnome-defpatterns-updated.qcow2" failed: 404 Not Found
Failed job: https://openqa.suse.de/tests/13856251
Updated by lmanfredi about 1 month ago
As suggested by Sofia in #note-4, this make also a list of PUBLISH_HDD_1
related jobs:
#!/usr/bin/env bash
set -e
function SEARCH_INCOMPLETE_ALL_GROUPS() {
declare -a groups=(
265 # SLE15 / Migration
535 # YaST & MMU
510 # Migration
129 # SLE15 / YaST
421 # YaST MU
478 # Migration Misc.
266 # Migration Milestone
520 # Yam Support Image
)
local params=''
for group in "${groups[@]}"; do
params="$params&groupid=$group"
done
local API_URL="https://openqa.suse.de/api/v1/jobs/overview?result=incomplete$params"
declare -a ids=(
$(curl -k -X GET "$API_URL" 2>/dev/null | jq '.[].id')
)
echo "# List of incomplete jobs:"
declare -A dict_qcow=()
for id in "${ids[@]}"; do
json="$(curl -k -X GET "https://openqa.suse.de/api/v1/jobs/${id}" 2>/dev/null)"
reason="$(echo $json | jq -r '.job.reason' )"
echo -e "https://openqa.suse.de/tests/$id\t[$reason]"
qcow=$(echo "$reason" | perl -ne 'print "$1" if /Failed to download (\S+)/')
[[ -n "$qcow" ]] && dict_qcow["$qcow"]=1
done
[[ -z "${s_qcow[@]}" ]] && return
echo -e "\n# Failed to download:"
local IFS=$'\n'
s_qcow=($(sort <<<"${!dict_qcow[*]}"))
for qcow in "${s_qcow[@]}"; do
echo "${qcow}"
done
echo -e "\n# Jobs to restart:"
declare -a ids=(
$(curl -k -X GET "https://openqa.suse.de/api/v1/jobs/overview?groupid=520" 2>/dev/null | jq '.[].id')
)
# declare -a ids=(
# $(curl -k -X GET "https://openqa.suse.de/api/v1/jobs/overview?groupid=520&groupid=446&groupid=265&groupid=129" 2>/dev/null | jq '.[].id')
# )
for id in "${ids[@]}"; do
json="$(curl -k -X GET "https://openqa.suse.de/api/v1/jobs/${id}" 2>/dev/null)"
PUBLISH_HDD_1="$(echo $json | jq -r '.job.settings.PUBLISH_HDD_1 | select( . != null)' )"
TEST="$(echo $json | jq -r '.job.settings.TEST | select( . != null)' )"
if [ -n "$PUBLISH_HDD_1" ]; then
for qcow in "${s_qcow[@]}"; do
[ "$PUBLISH_HDD_1" == "$qcow" ] && echo "[$TEST]: https://openqa.suse.de/tests/$id for $qcow"
done
fi
done
}
SEARCH_INCOMPLETE_ALL_GROUPS