Actions
action #100973
closedCancel any scheduled jobs after a configurable timeout, e.g. days size:M
Description
Motivation¶
@tinita was investigating job age alerts and found a job with a WORKER_CLASS that doesn't match any workers. This was traced by to @asmorodskyi who then identified the change which was incorrect use of +WORKER_CLASS
(+WORKER_CLASS is combined rather than overridden).
Regardless of what caused this, instead of a developer monitoring jobs and figuring out what happened, we should have openQA cancel unmatch.
Acceptance criteria¶
- AC1: Cancel any scheduled jobs after a timeout
Suggestions¶
- Cancel any job that is scheduled for multiple days, good default is 7 days
- Do the cancellation in the scheduler, use an additional timer if performance is impacted
Workaround¶
Have a person monitor alerts and investigate jobs that never run, cancel the job and file a new ticket.
Actions