Actions
action #179185
opencoordination #154777: [saga][epic] Shareable os-autoinst and test distribution plugins
coordination #162131: [epic] future version control related features in openQA
Detection of long-time remote git clone outages size:S
Start date:
2025-03-17
Due date:
% Done:
0%
Estimated time:
Description
Motivation¶
Currently, git_clone minion jobs fail when GitLab is temporarily unreachable (see #178492), by introducing a proper error-handling mechanism, we can ensure:
- A simple tracking mechanism to detect and alert on when there are longer GitLab downtimes.
User Story¶
"As a test engineer and openQA operator,
i want openQA to handle short-lived GitLab outages without causing mass Minion job failures,
so that users do not experience unnecessary disruption, while prolonged outages are still detected and reported effectively
Acceptance Criteria¶
- AC1: Temporary remote git outages don't cause failing minion jobs
- AC2: An update of remote git repositories is still ensured on shorter failed requests, e.g. in range of seconds
- AC3: Longer remote git unavailabilities trigger an alert
-
AC4: Details about the longer remote git unavailabilities are available on openQA side, e.g. in the minion job details from the git call error (
Internal API unreachable
)
Suggestions¶
- Damage is likely limited. If we can't sync needles nobody can edit needles.
- Add a temporary file as part of the git clone OR somewhere in /var/lib/ where we have existing caching mechanisms.
- Add a mechanism to discover longer-term outages of GitLab
- Check if the file is n seconds old to determine the state
Updated by okurz 3 months ago
- Copied from action #179038: Gracious handling of longer remote git clones outages size:S added
Updated by okurz about 2 months ago
- Target version changed from Tools - Next to future
Actions