action #179038
Updated by okurz 15 days ago
## Motivation
Currently, git_clone minion jobs fail when GitLab is temporarily unreachable (see #178492), by introducing a proper error-handling mechanism, we can ensure:
- Temporary outages do not cause unnecessary job failures or alerts.
- Jobs completely ignore short outages or mark themselves as "skipped" instead of failing.
- A simple tracking mechanism to detect and alert on when there are longer GitLab downtimes.
### User Story
```
"As a test engineer and openQA operator,
i want openQA to handle short-lived GitLab outages without causing mass Minion job failures,
so that users do not experience unnecessary disruption disruption, while prolonged outages are still detected and reported effectively
```
## Acceptance Criteria
* **AC1:** Temporary remote git GitLab outages don't cause failing minion jobs
* **AC2:** An update A mechanism exists to discover longer-term outages of remote git repositories is still ensured on shorter failed requests, e.g. in range of seconds GitLab
## Suggestions
* Damage is likely limited. If we can't sync needles nobody can edit needles.
* Jobs end up incomplete if there's an on-going issue with git_clone minion jobs Add a temporary file as part of the git clone OR somewhere in /var/lib/ where we have existing caching mechanisms.
* We could decide Check if the file is n seconds old to eventually give up and continue anyway and let jobs run determine the state.
Back