Project

General

Profile

Actions

action #100503

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert

Identify all "finalize_job_results" failures and handle them (report ticket or fix)

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2021-10-07
Due date:
% Done:

0%

Estimated time:

Description

Motivation

We have failed minion jobs and related alerts. If a job would just fail due to a user-provided hook script it should not be our business but so far for OSD we do not have any of these. Otherwise we would know because that needs to be configured over salt. Mostly we configure the hook script for the investigation ourselves and want to be informed about problems. So we want to consider these failures after all.

Acceptance criteria

  • AC1: All recent "finalize_job_result" failures are investigated and handled accordingly (ticket reported or fixed)

Suggestions

Out of scope

  • result: 'Job terminated unexpectedly (exit code: 0, signal: 15)' (#99831)
Actions

Also available in: Atom PDF