Project

General

Profile

Actions

action #98499

closed

[alert] web UI: Too many Minion job failures alert size:S

Added by okurz about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2021-09-13
Due date:
% Done:

0%

Estimated time:

Description

Observation

alert received on 2021-09-13 at the time when OSD was deployed. https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=19&orgId=1&from=now-7d&to=now shows there were already 20 failed and during deployment – which could be coincidence – another minion job failed reaching 21 failed minion jobs.

Acceptance criteria

  • AC1: At least a ticket exists for each different issue
  • AC2: The alert description mentions the tickets for all known issues that could explain failures

Suggestions

  • Review current failures and ensure that a ticket exists for each type (see related tickets)
  • Remove all failed minion jobs after ensuring the problem is recorded in tickets
  • Unpause alert

Rollback measures

  • Unpause alert

Related issues 2 (2 open0 closed)

Related to openQA Project (public) - coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alertNew2020-09-01

Actions
Related to openQA Project (public) - action #70774: save_needle Minion tasks fail frequently and needles could get lostNew2020-09-01

Actions
Actions

Also available in: Atom PDF