No job_restart event for jobs restarted via `RETRY`
If you use the
RETRY auto-retry mechanism, AFAICS, when the job is restarted, no event is emitted. Logically speaking, I'd expect a job_restart event to be emitted, as it is when you restart a job manually.
This seems like it should be easy to fix so I was just going to send a PR, but actually the codepaths are kind of long and complex and there's a question without an obvious answer (to me).
RETRY handling starts in lib/OpenQA/Schema/Results/Jobs.pm
done(), which checks if RETRY is set and the job failed and calls
$self->auto_duplicate, which goes through a whole other pile of functions that wind up actually restarting the job. So I could just stick an
emit_event in that pile somewhere - there's a precedent for using
emit_event in that file, as
update_result(). However, there's another path where something calls
$self->auto_duplicate, and that thing emits the event itself; in lib/OpenQA/WebAPI/Controller/API/V1/Job.pm,
OpenQA::Resource::Jobs::job_restart(), which calls
$job->auto_duplicate(), and then
_restart() emits the event.
So, what's the best way to do this? Move the event emission somewhere under
$job->auto_duplicate() and drop the
API/V1/Job.pm? Or have
done() emit the event after calling
$self->auto_duplicate(), kinda mirroring what
_restart() does? Or is there a better idea? I'm not really sure.
#1 Updated by okurz about 2 months ago
- Category set to Feature requests
- Target version set to future
So the question is if the automatic retry should be considered the same kind of "restart" as the manual restarts. The idea was that only manually or externally triggered restarts would trigger the job_restart event. May I ask what would you need the event for? Maybe we can introduce another specific event?
#2 Updated by AdamWill about 2 months ago
Sure. We want an external system to know when openQA jobs are scheduled (and, in future, when they start running). The most obvious way to know this is for openQA to communicate when it happens. So ideally I want there to be an event emitted any time a job is created (and also any time a job starts running, but we'll come to that later).
Note openQA has already had one case where there was previously a 'similar' event (job_duplicate) and it got rolled into the job_restart event for the sake of simplicity. So I kinda figured openQA would just want to use job_restart again in this case and not invent some new event, but practically speaking it isn't really an issue, it wouldn't be a problem at all to handle a different event name with the approach we (Fedora) will be using to implement what we want to do.
#3 Updated by AdamWill about 2 months ago
ping? any thoughts here? Thanks!
#4 Updated by okurz about 2 months ago
I would give others some time to bring up their thoughts. SUSE currently has HackWeek so please expect a delay in comments this week.
#5 Updated by AdamWill about 2 months ago
ah, thanks, wasn't aware that was going on.
#6 Updated by mkittler 14 days ago
It would make most sense of there's one restart event regardless of the case but the event would carry additional information (e.g. the user that restarted the event, or that it was due to
RESTART or due to the reason matched the auto-clone regex).
It would likely make sense if the reason was passed to
auto_duplicate and that function then emits the event (unless there was an error). The
auto_duplicate function is e.g. called when setting a job "done". This in turn can happen from multiple services (main web UI, GRU, possibly more). So in order to make sending events from
auto_duplicate work in all cases you need to ensure that plugins that use those events (AMQP, audit log) are loaded on startup of those services.