Project

General

Profile

action #123625

No job_restart event for jobs restarted via `RETRY`

Added by AdamWill about 2 months ago. Updated 14 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2023-01-24
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

If you use the RETRY auto-retry mechanism, AFAICS, when the job is restarted, no event is emitted. Logically speaking, I'd expect a job_restart event to be emitted, as it is when you restart a job manually.

This seems like it should be easy to fix so I was just going to send a PR, but actually the codepaths are kind of long and complex and there's a question without an obvious answer (to me).

RETRY handling starts in lib/OpenQA/Schema/Results/Jobs.pm done(), which checks if RETRY is set and the job failed and calls $self->auto_duplicate, which goes through a whole other pile of functions that wind up actually restarting the job. So I could just stick an emit_event in that pile somewhere - there's a precedent for using emit_event in that file, as OpenQA::App->singleton->emit_event(), in update_result(). However, there's another path where something calls $self->auto_duplicate, and that thing emits the event itself; in lib/OpenQA/WebAPI/Controller/API/V1/Job.pm, _restart() calls OpenQA::Resource::Jobs::job_restart(), which calls $job->auto_duplicate(), and then _restart() emits the event.

So, what's the best way to do this? Move the event emission somewhere under $job->auto_duplicate() and drop the emit_event() from _restart() in API/V1/Job.pm? Or have done() emit the event after calling $self->auto_duplicate(), kinda mirroring what _restart() does? Or is there a better idea? I'm not really sure.

History

#1 Updated by okurz about 2 months ago

  • Category set to Feature requests
  • Target version set to future

So the question is if the automatic retry should be considered the same kind of "restart" as the manual restarts. The idea was that only manually or externally triggered restarts would trigger the job_restart event. May I ask what would you need the event for? Maybe we can introduce another specific event?

#2 Updated by AdamWill about 2 months ago

Sure. We want an external system to know when openQA jobs are scheduled (and, in future, when they start running). The most obvious way to know this is for openQA to communicate when it happens. So ideally I want there to be an event emitted any time a job is created (and also any time a job starts running, but we'll come to that later).

Note openQA has already had one case where there was previously a 'similar' event (job_duplicate) and it got rolled into the job_restart event for the sake of simplicity. So I kinda figured openQA would just want to use job_restart again in this case and not invent some new event, but practically speaking it isn't really an issue, it wouldn't be a problem at all to handle a different event name with the approach we (Fedora) will be using to implement what we want to do.

#3 Updated by AdamWill about 2 months ago

ping? any thoughts here? Thanks!

#4 Updated by okurz about 2 months ago

I would give others some time to bring up their thoughts. SUSE currently has HackWeek so please expect a delay in comments this week.

#5 Updated by AdamWill about 2 months ago

ah, thanks, wasn't aware that was going on.

#6 Updated by mkittler 14 days ago

It would make most sense of there's one restart event regardless of the case but the event would carry additional information (e.g. the user that restarted the event, or that it was due to RESTART or due to the reason matched the auto-clone regex).

It would likely make sense if the reason was passed to auto_duplicate and that function then emits the event (unless there was an error). The auto_duplicate function is e.g. called when setting a job "done". This in turn can happen from multiple services (main web UI, GRU, possibly more). So in order to make sending events from auto_duplicate work in all cases you need to ensure that plugins that use those events (AMQP, audit log) are loaded on startup of those services.

Also available in: Atom PDF