Project

General

Profile

Actions

action #71386

closed

Stale job detection fails with "Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm"

Added by AdamWill over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2020-09-15
Due date:
% Done:

0%

Estimated time:

Description

I just checked the logs on the Fedora openQA server and saw a ton of this:

Sep 15 22:03:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:05:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:07:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:09:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:11:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:13:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495

literally I have one of those messages every two minutes. It seems like the problem is something like Schema/Result/Jobs.pm store_column is kinda assuming it'll be run by the WebAPI process (which has a $self->gru, apparently) but in this case gets run by the scheduler, which I guess doesn't?

This code in store_column was added in https://github.com/os-autoinst/openQA/commit/1639ef7d46cfc72d0f4ab7a53603a458c3bccafd , so I think that's probably when this broke...

Actions #1

Updated by mkittler over 4 years ago

  • Assignee set to mkittler

I'm looking into it today because I'm currently dealing with the stale job detection anyways. The stale job detection is broken in more ways so we likely didn't notice the problem so far.

Actions #2

Updated by okurz over 4 years ago

  • Priority changed from Normal to High
  • Target version set to Ready
Actions #3

Updated by mkittler over 4 years ago

  • Status changed from New to In Progress

The error has been fixed by https://github.com/os-autoinst/openQA/pull/3389 because I needed to handle the case when there's an app with no Gru plugin anyways to make tests work.

I also created https://github.com/os-autoinst/openQA/pull/3397 which actually loads the Gru plugin within the scheduler so the scheduler is able to enqueue the finalize tasks when marking a stale job as incomplete. I've also extended the test for the stale job detection to use the real scheduler app to cover this case.

Actions #4

Updated by AdamWill over 4 years ago

Thanks a lot, those look good.

Actions #5

Updated by okurz over 4 years ago

  • Status changed from In Progress to Resolved

Considering that we have not seen the error messages ourselves and the code is deployed and fine on o3 meanwhile we can regard this as "Resolved"

Actions #6

Updated by AdamWill over 4 years ago

yeah, I'm OK with that. the fix doesn't backport cleanly to the snapshot we're currently on, and I don't want to update to current master right now as we're in the Beta crunch, so I can't positively confirm the fix, but it sure does look right.

Actions

Also available in: Atom PDF