Project

General

Profile

action #100973

Updated by mkittler over 2 years ago

## Motivation 

 @tinita was investigating job age alerts and found [a job with a WORKER_CLASS that doesn't match any workers](https://openqa.suse.de/tests/7068360). This was traced by to @asmorodskyi who then identified [the change](https://gitlab.suse.de/qac/qac-openqa-yaml/-/merge_requests/414) which was incorrect use of `+WORKER_CLASS` ([+WORKER_CLASS is combined rather than overridden](https://open.qa/docs/#_variable_precedence)). 

 Regardless of what caused this, instead of a developer monitoring jobs and figuring out what happened, we should have openQA cancel unmatch. 

 ## Acceptance criteria 
 * **AC1:** Cancel any scheduled jobs after a timeout 

 ## Suggestions 
 * Cancel any job that is scheduled for multiple days, good default is 7 days 
 * Do the cancellation in the scheduler, use an additional timer if performance is impacted 

 ## Workaround 
 Have a person monitor alerts and investigate jobs that never run, cancel the job and file a new ticket.

Back