Project

General

Profile

action #179038

Updated by okurz 15 days ago

## Motivation 
 Currently, git_clone minion jobs fail when GitLab is temporarily unreachable (see #178492), by introducing a proper error-handling mechanism, we can ensure: 
 - Temporary outages do not cause unnecessary job failures or alerts. 
 - Jobs completely ignore short outages or mark themselves as "skipped" instead of failing. 
 - A simple tracking mechanism to detect and alert on when there are longer GitLab downtimes. 

 ### User Story 
 ``` 
 "As a test engineer and openQA operator, 
 i want openQA to handle short-lived GitLab outages without causing mass Minion job failures, 
 so that users do not experience unnecessary disruption disruption, while prolonged outages are still detected and reported effectively 
 ``` 

 ## Acceptance Criteria 
 * **AC1:** Temporary remote git GitLab outages don't cause failing minion jobs 
 * **AC2:** An update A mechanism exists to discover longer-term outages of remote git repositories is still ensured on shorter failed requests, e.g. in range of seconds GitLab 

 ## Suggestions 
 * Damage is likely limited. If we can't sync needles nobody can edit needles. 
 * Jobs end up incomplete if there's an on-going issue with git_clone minion jobs Add a temporary file as part of the git clone OR somewhere in /var/lib/ where we have existing caching mechanisms. 
 * We could decide Check if the file is n seconds old to eventually give up and continue anyway and let jobs run determine the state.

Back