action #18462

Move GRU tasks into Minion jobs

Added by coolo almost 3 years ago. Updated almost 2 years ago.

Status:ResolvedStart date:10/04/2017
Priority:NormalDue date:
Assignee:EDiGiacinto% Done:

50%

Category:Feature requests
Target version:Done
Difficulty:
Duration:

Description

See https://github.com/os-autoinst/openQA/issues/519 for features that come with Minion - that we don't want to reimplement in GRU. The main reason I introduced GRU was that Minion requires Mojo::Pg and I did not want to have so many new dependencies for a simple things as background jobs.


Related issues

Related to openQA Project - action #23318: Limit gru tasks Resolved 11/08/2017
Duplicated by openQA Project - action #30877: If gru task cannot be completed, it will attempt it forev... Rejected 28/01/2018

History

#1 Updated by AdamWill almost 3 years ago

We can't "See https://github.com/os-autoinst/openQA/issues/519", since you turned off that issue tracker.

#2 Updated by coolo almost 3 years ago

This is Adam's issue (copied from 519, which I did not expect to turn into
a 404 when disabling issues :(

So, yeah, I really hate how Gru is set up to work when a task fails.


It just leaves it in the queue and loops back around. So until a higher priority task appears, it just tries the failed task over and over again. If the failure isn't transient, it'll just keep failing over and over and over and over. It never goes to sleep. It never decides "this just isn't working out" and puts the task off to the side and warns the admin or anything. Nope. It just loops around eternally, failing again and again and again. When a higher priority task appears it'll do that, but then go right back to looping on the broken task. Lower priority tasks will never get run until the failing task is cleared out somehow.


I would like to fix this; I hope I'll get some spare time to work on it. Here is my initial idea: the Gru task schema should get a new column, 'failure_count' or somesuch. It's an integer. Every time Gru ran a task and it failed, it would increment the integer. Gru's search for 'what task should I do next' should exclude tasks whose failure_count is higher than, say, 5. There would be a page or something in the admin interface which listed tasks in this state and let you manually reset their failure count to get them run again (so you could figure out what was wrong with them). Maybe Gru would have a one-time code block which searched for all tasks with failure_count > 5 and logged their IDs on startup (as just another place where the admin could notice broken tasks).

#3 Updated by coolo over 2 years ago

  • Target version set to Ready

#4 Updated by coolo about 2 years ago

  • Duplicated by action #30877: If gru task cannot be completed, it will attempt it forever in a loop, never reaching other tasks added

#5 Updated by szarate almost 2 years ago

  • Status changed from New to In Progress
  • Target version changed from Ready to Current Sprint

#6 Updated by szarate almost 2 years ago

  • Assignee set to EDiGiacinto

#7 Updated by EDiGiacinto almost 2 years ago

  • % Done changed from 0 to 50

Changes are almost done: https://github.com/mudler/openQA/tree/minion

Currently refining it, and needs staging tests still.

Just a side note: since GRU is tied with openQA jobs, i modified GRU as a soft-wrap over Minion.
Minion supports additionals 'data' to be carried for single jobs, but it would require further queries to scrape the relationships that we already have defined in GRU with our schema classes.

#8 Updated by EDiGiacinto almost 2 years ago

I think it's safe to merge, but we need to also merge https://build.opensuse.org/request/show/599323 as we need latest (available) Minion release

#9 Updated by szarate almost 2 years ago

#10 Updated by szarate almost 2 years ago

  • Status changed from In Progress to Resolved

#11 Updated by szarate almost 2 years ago

  • Target version changed from Current Sprint to Done

Also available in: Atom PDF