Project

General

Profile

Actions

action #179885

open

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert

coordination #99831: [epic] Better handle minion tasks failing with "Job terminated unexpectedly"

Better handle minion tasks failing with "Job terminated unexpectedly" - OpenQA::Task::Git::Clone git_clone

Added by tinita 8 days ago. Updated 1 day ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Start date:
2025-04-02
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: minion job "git_clone" has a sigterm handler to decide how to shut down in a clean way in a reasonable time
  • AC2: Our minion job list on OSD and O3 do not show any "Job terminated unexpectedly" over multiple deployments for "git_clone"

Suggestions

  • Implement sigterm handler for "download_asset" like it has already been done for other jobs (e.g. #103416)
  • Test on o3 and osd either by manually restarting openqa-gru multiple times or awaiting the result from multiple deployments and checking the minion dashboard, e.g. openqa.suse.de/minion/jobs?state=failed
    • (Note: I don't know who suggested this but restarting openqa-gru multiple times in production can result in the actual problem, and then we are left with incomplete jobs, so try to do it in a local instance or at least check and repair things if necessary)

Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #179873: Long delays due to "git_clone" tasksRejectedokurz2025-04-02

Actions
Copied from openQA Project (public) - action #108980: Better handle minion tasks failing with "Job terminated unexpectedly" - OpenQA::Task::Asset::Download size:SResolvedybonatakis2022-03-25

Actions
Actions #1

Updated by tinita 8 days ago

  • Copied from action #108980: Better handle minion tasks failing with "Job terminated unexpectedly" - OpenQA::Task::Asset::Download size:S added
Actions #2

Updated by tinita 8 days ago

  • Related to action #179873: Long delays due to "git_clone" tasks added
Actions #3

Updated by okurz 8 days ago

  • Target version changed from Tools - Next to Ready
Actions #4

Updated by okurz 5 days ago

  • Target version changed from Ready to Tools - Next
Actions #5

Updated by tinita 2 days ago

  • Target version changed from Tools - Next to Ready
Actions #6

Updated by okurz 1 day ago

  • Status changed from New to Workable
Actions #7

Updated by tinita 1 day ago

  • Subject changed from Better handle minion tasks failing with "Job terminated unexpectedly" - OpenQA::Task::Git::Clone git_clone size:S to Better handle minion tasks failing with "Job terminated unexpectedly" - OpenQA::Task::Git::Clone git_clone
  • Status changed from Workable to New
Actions #8

Updated by tinita 1 day ago

Might have to wait on #179038

Actions #9

Updated by tinita 1 day ago

  • Description updated (diff)
Actions #10

Updated by okurz 1 day ago

  • Target version changed from Ready to Tools - Next
Actions

Also available in: Atom PDF