action #152281
closedcoordination #102915: [saga][epic] Automated classification of failures
QA (public) - coordination #94105: [epic] Use feedback from openqa-investigate to automatically inform on github pull requests, open tickets, weed out automatically failed tests
Schedule openQA SLE maintenance bisect jobs with lower priority same as openqa-investigate
Description
Motivation¶
In openqa-investigate we already add +100 to the prio value to give production jobs priority. For openqa-trigger-bisect-jobs https://github.com/os-autoinst/scripts/blob/master/openqa-trigger-bisect-jobs we do not do that yet leading to problems as mentioned in https://suse.slack.com/archives/C02CANHLANP/p1702031247953039 by mgriessmeier. So we should also ensure that we can adjust the priority of generated jobs in openqa-trigger-bisect-jobs.
Acceptance criteria¶
- AC1: All jobs created by openqa-trigger-bisect-jobs have a prio value of at least 100
Suggestions¶
- Look for "prio_add" in https://github.com/os-autoinst/scripts/blob/master/openqa-investigate
- Do the equivalent in https://github.com/os-autoinst/scripts/blob/master/openqa-trigger-bisect-jobs
- Monitor effect in production
Updated by mkittler about 1 year ago
- Status changed from In Progress to Feedback
Updated by okurz about 1 year ago
- Status changed from Feedback to Workable
https://openqa.suse.de/tests?match=:investigate: shows multiple jobs with default prio 50, e.g. https://openqa.suse.de/tests/13042757 which is "qam_alpha_supportserver:investigate:last_good_tests_and_build:f69e77d29d96cab7c9a3e18c5cd2cfb73f371ee4+20231208-1", so not triggered by the bisect script but related. I assume this is related to jobs in a multi-machine cluster where maybe only the initial job gets the +100 prio value and others not. And that problem might apply to both openqa-investigate as well as openqa-bisect.
Updated by mkittler about 1 year ago
- Status changed from Workable to In Progress
No, this problem only applies to openqa-investigate
.
In openqa-trigger-bisect-jobs
I implemented setting the prio properly via a loop over all jobs:
for job_id in sorted(created_job_ids):
log.info(f"Created {job_id}")
created += f"* **{test_name}**: {base_url}/t{job_id}\n"
openqa_set_job_prio(job_id, args.url, prio, args.dry_run)
Only in openqa-investigate
we have code that doesn't take into account that we might have cloned multiple jobs:
# output: { "$id": $clone_id }
clone_id=$(echo "$out" | runjq -r ".\"$id\"")
# Create markdown list entry
echo "* *$name*: t#$clone_id"
"${client_call[@]}" --json --data "{\"priority\": $((base_prio + prio_add))}" -X PUT jobs/"$clone_id" >/dev/null
Updated by mkittler about 1 year ago
- Status changed from In Progress to Feedback
Updated by mkittler about 1 year ago
- Status changed from Feedback to Resolved
The PR has been merged and it looks like it works in production, e.g. https://openqa.suse.de/tests/13098102 and https://openqa.suse.de/tests/13098107.
There are jobs with prio 80 but those have likely been changed manually (because 80 is also not the default prio):
openqa=> select id, priority from jobs where state = 'scheduled' and priority < 150 and test like '%:investigate:%' limit 10;
id | priority
----------+----------
13097441 | 80
13097440 | 80
(2 rows)