Project

General

Profile

Actions

action #91157

closed

[Alerting] web UI: Too many Minion job failures alert: limit_results_and_logs failed

Added by Xiaojing_liu over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-04-15
Due date:
2021-04-30
% Done:

0%

Estimated time:
Tags:

Description

Observation

Task limit_results_and_logs failed, the message shows that:

args: []
attempts: 1
children: []
created: 2021-04-14T22:00:06.22281Z
delayed: 2021-04-14T22:01:06.30156Z
expires: 2021-04-16T22:00:06.22281Z
finished: 2021-04-14T22:42:21.30191Z
id: 1683710
lax: 0
notes:
  gru_id: 28797918
parents: []
priority: 5
queue: default
result: |
  Can't call method "all" on unblessed reference at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobGroups.pm line 262.
retried: 2021-04-14T22:00:06.30156Z
retries: 1
started: 2021-04-14T22:01:08.42185Z
state: failed
task: limit_results_and_logs
time: 2021-04-15T01:51:50.84057Z
worker: 432

See details in https://openqa.suse.de/minion/jobs?state=failed&offset=0

Actions #1

Updated by mkittler over 3 years ago

  • Tags set to alert
  • Category set to Regressions/Crashes
Actions #2

Updated by mkittler over 3 years ago

  • Assignee set to mkittler
Actions #4

Updated by mkittler over 3 years ago

  • Status changed from New to In Progress
Actions #5

Updated by okurz over 3 years ago

  • Target version set to Ready

PR merged.

Actions #6

Updated by openqa_review over 3 years ago

  • Due date set to 2021-04-30

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by mkittler over 3 years ago

  • Status changed from In Progress to Feedback
Actions #8

Updated by mkittler over 3 years ago

  • Status changed from Feedback to In Progress

My fix only landed on OSD with today's deployment so I'm re-triggering the last limit_results_and_logs job now. It is currently still active. I hope it'll work because we've already received the filesystem alert as results have been piling up.

I'll also check which group has been configured with a retention period of infinity. Having such a configuration triggered the bug but regardless of the bug being fixed now it might be questionable to have such a configuration.

Actions #9

Updated by mkittler over 3 years ago

The job group https://openqa.suse.de/admin/job_templates/276 had no/infinite retention periods for results configured. I've just set it to match its parent group (see https://openqa.suse.de/admin/job_templates/276) as we can not keep the results around forever.

The cleanup is meanwhile still running but could already bring the results under the alert threshold.

Actions #10

Updated by mkittler over 3 years ago

  • Status changed from In Progress to Resolved

The cleanup task succeeded again and we're still below the alert threshold.

Actions

Also available in: Atom PDF