Project

General

Profile

Actions

action #179185

open

coordination #154777: [saga][epic] Shareable os-autoinst and test distribution plugins

coordination #162131: [epic] future version control related features in openQA

Detection of long-time remote git clone outages size:S

Added by okurz 3 months ago. Updated about 2 months ago.

Status:
Workable
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2025-03-17
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Currently, git_clone minion jobs fail when GitLab is temporarily unreachable (see #178492), by introducing a proper error-handling mechanism, we can ensure:

  • A simple tracking mechanism to detect and alert on when there are longer GitLab downtimes.

User Story

"As a test engineer and openQA operator,
i want openQA to handle short-lived GitLab outages without causing mass Minion job failures,
so that users do not experience unnecessary disruption, while prolonged outages are still detected and reported effectively

Acceptance Criteria

  • AC1: Temporary remote git outages don't cause failing minion jobs
  • AC2: An update of remote git repositories is still ensured on shorter failed requests, e.g. in range of seconds
  • AC3: Longer remote git unavailabilities trigger an alert
  • AC4: Details about the longer remote git unavailabilities are available on openQA side, e.g. in the minion job details from the git call error (Internal API unreachable)

Suggestions

  • Damage is likely limited. If we can't sync needles nobody can edit needles.
  • Add a temporary file as part of the git clone OR somewhere in /var/lib/ where we have existing caching mechanisms.
  • Add a mechanism to discover longer-term outages of GitLab
  • Check if the file is n seconds old to determine the state

Related issues 1 (0 open1 closed)

Copied from openQA Project (public) - action #179038: Gracious handling of longer remote git clones outages size:SResolvedmkittler2025-03-17

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #179038: Gracious handling of longer remote git clones outages size:S added
Actions #2

Updated by okurz about 2 months ago

  • Priority changed from Normal to Low
Actions #3

Updated by okurz about 2 months ago

  • Target version changed from Tools - Next to future
Actions

Also available in: Atom PDF