Project

General

Profile

Actions

action #18462

closed

Move GRU tasks into Minion jobs

Added by coolo about 7 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2017-04-10
Due date:
% Done:

50%

Estimated time:

Description

See https://github.com/os-autoinst/openQA/issues/519 for features that come with Minion - that we don't want to reimplement in GRU. The main reason I introduced GRU was that Minion requires Mojo::Pg and I did not want to have so many new dependencies for a simple things as background jobs.


Related issues 2 (0 open2 closed)

Related to openQA Project - action #23318: Limit gru tasksResolvedEDiGiacinto2017-08-11

Actions
Has duplicate openQA Project - action #30877: If gru task cannot be completed, it will attempt it forever in a loop, never reaching other tasksRejected2018-01-28

Actions
Actions #1

Updated by AdamWill about 7 years ago

We can't "See https://github.com/os-autoinst/openQA/issues/519", since you turned off that issue tracker.

Actions #2

Updated by coolo about 7 years ago

This is Adam's issue (copied from 519, which I did not expect to turn into
a 404 when disabling issues :(

So, yeah, I really hate how Gru is set up to work when a task fails.

It just leaves it in the queue and loops back around. So until a higher priority task appears, it just tries the failed task over and over again. If the failure isn't transient, it'll just keep failing over and over and over and over. It never goes to sleep. It never decides "this just isn't working out" and puts the task off to the side and warns the admin or anything. Nope. It just loops around eternally, failing again and again and again. When a higher priority task appears it'll do that, but then go right back to looping on the broken task. Lower priority tasks will never get run until the failing task is cleared out somehow.

I would like to fix this; I hope I'll get some spare time to work on it. Here is my initial idea: the Gru task schema should get a new column, 'failure_count' or somesuch. It's an integer. Every time Gru ran a task and it failed, it would increment the integer. Gru's search for 'what task should I do next' should exclude tasks whose failure_count is higher than, say, 5. There would be a page or something in the admin interface which listed tasks in this state and let you manually reset their failure count to get them run again (so you could figure out what was wrong with them). Maybe Gru would have a one-time code block which searched for all tasks with failure_count > 5 and logged their IDs on startup (as just another place where the admin could notice broken tasks).

Actions #3

Updated by coolo over 6 years ago

  • Target version set to Ready
Actions #4

Updated by coolo about 6 years ago

  • Has duplicate action #30877: If gru task cannot be completed, it will attempt it forever in a loop, never reaching other tasks added
Actions #5

Updated by szarate about 6 years ago

  • Status changed from New to In Progress
  • Target version changed from Ready to Current Sprint
Actions #6

Updated by szarate about 6 years ago

  • Assignee set to EDiGiacinto
Actions #7

Updated by EDiGiacinto about 6 years ago

  • % Done changed from 0 to 50

Changes are almost done: https://github.com/mudler/openQA/tree/minion

Currently refining it, and needs staging tests still.

Just a side note: since GRU is tied with openQA jobs, i modified GRU as a soft-wrap over Minion.
Minion supports additionals 'data' to be carried for single jobs, but it would require further queries to scrape the relationships that we already have defined in GRU with our schema classes.

Actions #8

Updated by EDiGiacinto almost 6 years ago

I think it's safe to merge, but we need to also merge https://build.opensuse.org/request/show/599323 as we need latest (available) Minion release

Actions #9

Updated by szarate almost 6 years ago

Actions #10

Updated by szarate almost 6 years ago

  • Status changed from In Progress to Resolved
Actions #11

Updated by szarate almost 6 years ago

  • Target version changed from Current Sprint to Done
Actions

Also available in: Atom PDF