action #181421
Updated by tinita 30 days ago
## Observation
When there are investigation jobs that are cancelled or obsoleted then the investigate:retry job is retrying 1440 times with a delay of one minute before giving up:
https://openqa.suse.de/minion/jobs?id=15293289
If many of those jobs exist, Minion will spend a lot of time retrying those instead of working on other jobs that could run and do useful stuff.
Maybe openqa-investigate can detect that jobs are cancelled.
* **AC1:** If investigate
And maybe the retrying can be implemented with exponential backoff as an additional improvement, for example when investigation jobs are cancelled, openqa-investigate detects that and creates the final comment without waiting any longer aren't picked up for a long time.
```
---
args:
- env ... enable_force_result=true email_unreviewed=true from_email=openqa-review@suse.de
notification_address=discuss-openqa-auto-r-aaaagmhuypu2hq2kmzgovutmqm@suse.slack.com
host=openqa.suse.de investigation_gid=637 exclude_name_regex='.*(SAPHanaSR|saptune).*'
exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization|BCI).*'
grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook
- 17444939
- delay: 60
kill_timeout: 10s
retries: 1440
skip_rc: 142
timeout: 10m
attempts: 1
children: []
created: 2025-04-24T15:48:34.951746Z
delayed: 2025-04-25T10:20:42.619188Z
expires: ~
finished: ~
id: 15293289
lax: 0
notes:
hook_cmd: env ... enable_force_result=true email_unreviewed=true from_email=openqa-review@suse.de
notification_address=discuss-openqa-auto-r-aaaagmhuypu2hq2kmzgovutmqm@suse.slack.com
host=openqa.suse.de investigation_gid=637 exclude_name_regex='.*(SAPHanaSR|saptune).*'
exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization|BCI).*'
grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook
hook_rc: 142
hook_result: ''
parents: []
priority: 0
queue: default
result: ~
retried: 2025-04-25T10:19:42.619188Z
retries: 921
started: 2025-04-25T10:19:35.150145Z
state: inactive
task: hook_script
time: 2025-04-25T10:20:24.275635Z
worker: 1989
```
## Suggestions
* Note that the main `investigate:retry` job might be cancelled itself, or one of the other jobs. Handling both cases might require different changes to the code
Back