action #91157
closed[Alerting] web UI: Too many Minion job failures alert: limit_results_and_logs failed
0%
Description
Observation¶
Task limit_results_and_logs failed, the message shows that:
args: []
attempts: 1
children: []
created: 2021-04-14T22:00:06.22281Z
delayed: 2021-04-14T22:01:06.30156Z
expires: 2021-04-16T22:00:06.22281Z
finished: 2021-04-14T22:42:21.30191Z
id: 1683710
lax: 0
notes:
gru_id: 28797918
parents: []
priority: 5
queue: default
result: |
Can't call method "all" on unblessed reference at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobGroups.pm line 262.
retried: 2021-04-14T22:00:06.30156Z
retries: 1
started: 2021-04-14T22:01:08.42185Z
state: failed
task: limit_results_and_logs
time: 2021-04-15T01:51:50.84057Z
worker: 432
See details in https://openqa.suse.de/minion/jobs?state=failed&offset=0
Updated by mkittler over 3 years ago
- Tags set to alert
- Category set to Regressions/Crashes
Updated by mkittler over 3 years ago
Updated by openqa_review over 3 years ago
- Due date set to 2021-04-30
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 3 years ago
- Status changed from In Progress to Feedback
Updated by mkittler over 3 years ago
- Status changed from Feedback to In Progress
My fix only landed on OSD with today's deployment so I'm re-triggering the last limit_results_and_logs
job now. It is currently still active. I hope it'll work because we've already received the filesystem alert as results have been piling up.
I'll also check which group has been configured with a retention period of infinity. Having such a configuration triggered the bug but regardless of the bug being fixed now it might be questionable to have such a configuration.
Updated by mkittler over 3 years ago
The job group https://openqa.suse.de/admin/job_templates/276 had no/infinite retention periods for results configured. I've just set it to match its parent group (see https://openqa.suse.de/admin/job_templates/276) as we can not keep the results around forever.
The cleanup is meanwhile still running but could already bring the results under the alert threshold.
Updated by mkittler over 3 years ago
- Status changed from In Progress to Resolved
The cleanup task succeeded again and we're still below the alert threshold.