action #91157: [Alerting] web UI: Too many Minion job failures alert: limit_results_and_logs failed - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #91157

closed

[Alerting] web UI: Too many Minion job failures alert: limit_results_and_logs failed

Added by Xiaojing_liu about 4 years ago. Updated about 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2021-04-15

Due date:

2021-04-30

% Done:

Estimated time:

Tags:

alert

Description

Observation¶

Task limit_results_and_logs failed, the message shows that:

args: []
attempts: 1
children: []
created: 2021-04-14T22:00:06.22281Z
delayed: 2021-04-14T22:01:06.30156Z
expires: 2021-04-16T22:00:06.22281Z
finished: 2021-04-14T22:42:21.30191Z
id: 1683710
lax: 0
notes:
  gru_id: 28797918
parents: []
priority: 5
queue: default
result: |
  Can't call method "all" on unblessed reference at /usr/share/openqa/script/../lib/OpenQA/Schema/Result/JobGroups.pm line 262.
retried: 2021-04-14T22:00:06.30156Z
retries: 1
started: 2021-04-14T22:01:08.42185Z
state: failed
task: limit_results_and_logs
time: 2021-04-15T01:51:50.84057Z
worker: 432

See details in https://openqa.suse.de/minion/jobs?state=failed&offset=0

Actions

Copy link

Updated by mkittler about 4 years ago

Tags set to alert
Category set to Regressions/Crashes

Actions

Copy link

Updated by mkittler about 4 years ago

Assignee set to mkittler

Actions

Copy link

Updated by mkittler about 4 years ago

PR: https://github.com/os-autoinst/openQA/pull/3845

Actions

Copy link

Updated by mkittler about 4 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by okurz about 4 years ago

Target version set to Ready

PR merged.

Actions

Copy link

Updated by openqa_review about 4 years ago

Due date set to 2021-04-30

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

Updated by mkittler about 4 years ago

Status changed from In Progress to Feedback

Actions

Copy link

Updated by mkittler about 4 years ago

Status changed from Feedback to In Progress

My fix only landed on OSD with today's deployment so I'm re-triggering the last limit_results_and_logs job now. It is currently still active. I hope it'll work because we've already received the filesystem alert as results have been piling up.

I'll also check which group has been configured with a retention period of infinity. Having such a configuration triggered the bug but regardless of the bug being fixed now it might be questionable to have such a configuration.

Actions

Copy link

Updated by mkittler about 4 years ago

The job group https://openqa.suse.de/admin/job_templates/276 had no/infinite retention periods for results configured. I've just set it to match its parent group (see https://openqa.suse.de/admin/job_templates/276) as we can not keep the results around forever.

The cleanup is meanwhile still running but could already bring the results under the alert threshold.

Actions

Copy link

#10

Updated by mkittler about 4 years ago

Status changed from In Progress to Resolved

The cleanup task succeeded again and we're still below the alert threshold.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #91157

[Alerting] web UI: Too many Minion job failures alert: limit_results_and_logs failed

Observation¶

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by okurz about 4 years ago

Updated by openqa_review about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago

Updated by mkittler about 4 years ago