Project

General

Profile

action #181421

Updated by tinita 30 days ago

## Observation 

 When there are investigation jobs that are cancelled or obsoleted then the investigate:retry job is retrying 1440 times with a delay of one minute before giving up: 
 https://openqa.suse.de/minion/jobs?id=15293289 

 If many of those jobs exist, Minion will spend a lot of time retrying those instead of working on other jobs that could run and do useful stuff. 

 Maybe openqa-investigate can detect that jobs are cancelled. 

 * **AC1:** If investigate 
 And maybe the retrying can be implemented with exponential backoff as an additional improvement, for example when investigation jobs are cancelled, openqa-investigate detects that and creates the final comment without waiting any longer aren't picked up for a long time. 

 ``` 
 --- 
 args: 
 - env ... enable_force_result=true email_unreviewed=true from_email=openqa-review@suse.de 
   notification_address=discuss-openqa-auto-r-aaaagmhuypu2hq2kmzgovutmqm@suse.slack.com 
   host=openqa.suse.de investigation_gid=637 exclude_name_regex='.*(SAPHanaSR|saptune).*' 
   exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization|BCI).*' 
   grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook 
 - 17444939 
 - delay: 60 
   kill_timeout: 10s 
   retries: 1440 
   skip_rc: 142 
   timeout: 10m 
 attempts: 1 
 children: [] 
 created: 2025-04-24T15:48:34.951746Z 
 delayed: 2025-04-25T10:20:42.619188Z 
 expires: ~ 
 finished: ~ 
 id: 15293289 
 lax: 0 
 notes: 
   hook_cmd: env ... enable_force_result=true email_unreviewed=true from_email=openqa-review@suse.de 
     notification_address=discuss-openqa-auto-r-aaaagmhuypu2hq2kmzgovutmqm@suse.slack.com 
     host=openqa.suse.de investigation_gid=637 exclude_name_regex='.*(SAPHanaSR|saptune).*' 
     exclude_group_regex='.*(Development|Public Cloud|Released|Others|Kernel|Virtualization|BCI).*' 
     grep_timeout=60 nice ionice -c idle /opt/os-autoinst-scripts/openqa-label-known-issues-and-investigate-hook 
   hook_rc: 142 
   hook_result: '' 
 parents: [] 
 priority: 0 
 queue: default 
 result: ~ 
 retried: 2025-04-25T10:19:42.619188Z 
 retries: 921 
 started: 2025-04-25T10:19:35.150145Z 
 state: inactive 
 task: hook_script 
 time: 2025-04-25T10:20:24.275635Z 
 worker: 1989 
 ``` 

 ## Suggestions 

 * Note that the main `investigate:retry` job might be cancelled itself, or one of the other jobs. Handling both cases might require different changes to the code

Back