Project

General

Profile

action #71386

Stale job detection fails with "Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm"

Added by AdamWill 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2020-09-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

I just checked the logs on the Fedora openQA server and saw a ton of this:

Sep 15 22:03:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:05:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:07:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:09:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:11:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:13:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495

literally I have one of those messages every two minutes. It seems like the problem is something like Schema/Result/Jobs.pm store_column is kinda assuming it'll be run by the WebAPI process (which has a $self->gru, apparently) but in this case gets run by the scheduler, which I guess doesn't?

This code in store_column was added in https://github.com/os-autoinst/openQA/commit/1639ef7d46cfc72d0f4ab7a53603a458c3bccafd , so I think that's probably when this broke...

History

#1 Updated by mkittler 11 months ago

  • Assignee set to mkittler

I'm looking into it today because I'm currently dealing with the stale job detection anyways. The stale job detection is broken in more ways so we likely didn't notice the problem so far.

#2 Updated by okurz 11 months ago

  • Priority changed from Normal to High
  • Target version set to Ready

#3 Updated by mkittler 11 months ago

  • Status changed from New to In Progress

The error has been fixed by https://github.com/os-autoinst/openQA/pull/3389 because I needed to handle the case when there's an app with no Gru plugin anyways to make tests work.

I also created https://github.com/os-autoinst/openQA/pull/3397 which actually loads the Gru plugin within the scheduler so the scheduler is able to enqueue the finalize tasks when marking a stale job as incomplete. I've also extended the test for the stale job detection to use the real scheduler app to cover this case.

#4 Updated by AdamWill 11 months ago

Thanks a lot, those look good.

#5 Updated by okurz 10 months ago

  • Status changed from In Progress to Resolved

Considering that we have not seen the error messages ourselves and the code is deployed and fine on o3 meanwhile we can regard this as "Resolved"

#6 Updated by AdamWill 10 months ago

yeah, I'm OK with that. the fix doesn't backport cleanly to the snapshot we're currently on, and I don't want to update to current master right now as we're in the Beta crunch, so I can't positively confirm the fix, but it sure does look right.

Also available in: Atom PDF