action #89224
closed
Limit execution time of hook scripts run within Minion
Added by mkittler about 4 years ago.
Updated about 4 years ago.
Category:
Feature requests
Description
motivation¶
Today we've seen that a few finalize_job_results
blocked the whole Minion job queue for quite a while (until manually aborted) because the command grep -qPzo '(?s)Gru job failed.*connection error.*Inactivity timeout'
from the hook script openqa-label-known-issues
kept the Minion workers busy.
acceptance criteria¶
- AC1: Hook scripts are aborted after a configurable timeout.
further notes¶
- I'm not sure what makes these openQA jobs which take so long to be investigated special but e.g. https://openqa.suse.de/tests/5527320 is one of them.
- Maybe
openqa-label-known-issues
can be made more efficient as well. Note that the mentioned grep
command actually caused a considerable CPU usage so the script wasn't just waiting for something.
- Target version set to Ready
- Status changed from New to Workable
- Assignee set to mkittler
I'll also add an upstream feature to Minion to help with this, a fast lane for high priority jobs, since it's a pretty common issue to have a bunch of very slow jobs clogging the queue. Then we can use --jobs 12 --spare 4
and low priority jobs will only use the first 12 slots, while 4 would always be reserved for high priority jobs.
- Due date set to 2021-03-16
Setting due date based on mean cycle time of SUSE QE Tools
- Status changed from Workable to In Progress
- Status changed from In Progress to Feedback
Note that for the cleanup we already limit the number of concurrently running jobs. The actual problem was the hook script execution (for automatic job investigation).
I'll check how to make use of the new Minion feature. Maybe we need to lower/increase the priority of some job types.
The new Minion version is in Factory and in our repos for Leap. I'll keep the ticket in feedback to wait until it is deployed in production.
- Status changed from Feedback to Resolved
The Minion dashboard on OSD shows now 2 spare workers. Together with the other changes this should make long-running hook scripts harmless.
Also available in: Atom
PDF