action #71386
closedStale job detection fails with "Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm"
Description
I just checked the logs on the Fedora openQA server and saw a ton of this:
Sep 15 22:03:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:05:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:07:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:09:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:11:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
Sep 15 22:13:13 openqa01.iad2.fedoraproject.org openqa-scheduler-daemon[930]: [info] Failed stale job detection : {UNKNOWN}: Can't locate object method "gru" via package "OpenQA::Scheduler" at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/Jobs.pm line 1764. at /usr/share/openqa/script/../lib/OpenQA/Scheduler/Model/Jobs.pm line 495
literally I have one of those messages every two minutes. It seems like the problem is something like Schema/Result/Jobs.pm store_column is kinda assuming it'll be run by the WebAPI process (which has a $self->gru, apparently) but in this case gets run by the scheduler, which I guess doesn't?
This code in store_column was added in https://github.com/os-autoinst/openQA/commit/1639ef7d46cfc72d0f4ab7a53603a458c3bccafd , so I think that's probably when this broke...
Updated by mkittler almost 4 years ago
- Assignee set to mkittler
I'm looking into it today because I'm currently dealing with the stale job detection anyways. The stale job detection is broken in more ways so we likely didn't notice the problem so far.
Updated by okurz almost 4 years ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by mkittler almost 4 years ago
- Status changed from New to In Progress
The error has been fixed by https://github.com/os-autoinst/openQA/pull/3389 because I needed to handle the case when there's an app with no Gru plugin anyways to make tests work.
I also created https://github.com/os-autoinst/openQA/pull/3397 which actually loads the Gru plugin within the scheduler so the scheduler is able to enqueue the finalize tasks when marking a stale job as incomplete. I've also extended the test for the stale job detection to use the real scheduler app to cover this case.
Updated by okurz almost 4 years ago
- Status changed from In Progress to Resolved
Considering that we have not seen the error messages ourselves and the code is deployed and fine on o3 meanwhile we can regard this as "Resolved"
Updated by AdamWill almost 4 years ago
yeah, I'm OK with that. the fix doesn't backport cleanly to the snapshot we're currently on, and I don't want to update to current master right now as we're in the Beta crunch, so I can't positively confirm the fix, but it sure does look right.