Project

General

Profile

Actions

action #104116

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert

coordination #99831: [epic] Better handle minion tasks failing with "Job terminated unexpectedly"

Better handle minion tasks failing with "Job terminated unexpectedly" - "scan_needles" size:M

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: minion job "scan_needles" have a sigterm handler to decide how to shut down in a clean way in a reasonable time
  • AC2: Our minion job list on OSD and O3 do not show any "Job terminated unexpectedly" over multiple deployments (or service restarts) for "scan_needles"

Suggestions

  • Implement sigterm handler for "scan_needles"
  • Test on o3 and osd either by manually restarting openqa-gru multiple times or awaiting the result from multiple deployments and checking the minion dashboard, e.g. openqa.suse.de/minion/jobs?state=failed

Related issues 2 (0 open2 closed)

Copied from openQA Project - action #103416: Better handle minion tasks failing with "Job terminated unexpectedly" - "limit_results_and_logs" size:MResolvedmkittler2021-12-02

Actions
Copied to openQA Project - action #107533: Better handle minion tasks failing with "Job terminated unexpectedly" - "finalize_job_results" size:MResolvedmkittler

Actions
Actions #1

Updated by okurz over 2 years ago

  • Copied from action #103416: Better handle minion tasks failing with "Job terminated unexpectedly" - "limit_results_and_logs" size:M added
Actions #2

Updated by okurz over 2 years ago

  • Status changed from New to Feedback
  • Assignee set to okurz
Actions #3

Updated by okurz over 2 years ago

  • Status changed from Feedback to Resolved

merged and deployed and another deployment was conducted. After that deployment https://openqa.suse.de/minion/jobs?state=failed actually shows no entries. This is great!

Actions #4

Updated by okurz about 2 years ago

  • Copied to action #107533: Better handle minion tasks failing with "Job terminated unexpectedly" - "finalize_job_results" size:M added
Actions

Also available in: Atom PDF