Project

General

Profile

action #174583

Updated by livdywan 9 days ago

### Observation 

 The pipeline is failing because the openQA jobs got obsoleted: 

 See: https://gitlab.suse.de/openqa/scripts-ci/-/jobs/3562638 

 ``` 
 {"blocked_by_id":null,"id":4713396,"result":"obsoleted","state":"done"} 
 {"blocked_by_id":null,"id":4713397,"result":"obsoleted","state":"done"} 
 ``` 

 The multimachine case looks a bit more involved e.g. https://gitlab.suse.de/openqa/scripts-ci/-/jobs/3625091 : 

 ``` 
 {"blocked_by_id":null,"id":16374878,"result":"skipped","state":"cancelled"} 
 1490{"blocked_by_id":null,"id":16374879,"result":"timeout_exceeded","state":"done"} 
 ``` 

 ### Acceptance Criteria 
 * **AC1**: Obsoleted jobs don't cause failures in GitLab pipelines 

 ## Suggestions 
 * ~~Verify if this is a specific worker or workers and take them out of production~~ 
 * ~~Consider restarting affected jobs~~ 
 * An "obsolete" should be considered part of expected behavior. How about a new openQA API route to follow job obsolescence? 
 * Ignore the case of "obsoleted" jobs as the pipeline runs frequently enough anyway. 
 * Check whether we cancel the full parallel cluster in case a job in it is cancelled/obsoleted as we also saw jobs with parallel dependencies ending up with the result "timeout_exceeded". 
   * Treat skipped/cancelled the same as obsoleted (and ignore it) 
   * Ensure this is logged in case it is not always the case 

 ### Mitigations 
 * **DONE** Pause [affected pipelines on GitLab](https://gitlab.suse.de/openqa/scripts-ci/-/pipeline_schedules) i.e. openqa-schedule-mm-ping-test o3/osd

Back