Project

General

Profile

action #179038

Updated by robert.richardson 17 days ago

## Motivation 
 Currently, git_clone minion jobs fail when GitLab is temporarily unreachable (see #178492), #178564), by introducing a proper error-handling mechanism, we can ensure: 
 - Temporary outages do not cause unnecessary job failures or alerts. 
 - Jobs completely ignore short outages or mark themselves as "skipped" instead of failing. 
 - A simple tracking mechanism to detect and alert on when there are longer GitLab downtimes. 

 ### User Story 
 ``` 
 "As a test engineer and openQA operator, 
 i want openQA to handle short-lived GitLab outages without causing mass Minion job failures, 
 so that users do not experience unnecessary disruption, while prolonged outages are still detected and reported effectively 
 ``` 

 ## Acceptance Criteria 
 * **AC1:** Temporary GitLab outages don't cause failing minion jobs 
 * **AC2:** A mechanism exists to discover longer-term outages of GitLab 

 ## Suggestions 
 * Damage is likely limited. If we can't sync needles nobody can edit needles. 
 * Add a temporary file as part of the git clone OR somewhere in /var/lib/ where we have existing caching mechanisms. 
 * Check if the file is n seconds old to determine the state.

Back