Project

General

Profile

action #163766

Updated by okurz about 2 months ago

## Observation 
 We are seeing this error and a couple of incompletes today: 
 ``` 
 Date: Thu, 11 Jul 2024 11:51:31 +0000 
                                                                                                                                                                          
 From: "GitLab@SUSE" <gitlab@suse.de> 
                                                                                                                                                                           
 To: osd-admins@suse.de 
                                                                                                                                                                                         
 Subject: Scripts CI | Failed pipeline for master | 33a115c3 

                                                                                                                                                    

 ``` 
 https://gitlab.suse.de/openqa/scripts-ci/-/pipelines/1207977 
 https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1720656188837&to=1720699027331&viewPanel=16 

 https://openqa.suse.de/tests/14894947 
 ``` 
 Result: incomplete, finished 10 minutes ago (ran for 36:46 minutes) 
 Reason: asset failure: Failed to download SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240710-1-Server-DVD-Updates-64bit.qcow2 to /var/lib/openqa/cache/openqa.suse.de/SLES-15-SP5-x86_64-mru-install-minimal-with-addons-Build20240710-1-Server-DVD-Updates-64bit.qcow2 
 ``` 

 The detailed error message is on e.g. https://openqa.suse.de/tests/14894946: 
 ``` 
 [info] [#21729] Download error 598, waiting 5 seconds for next try (1 remaining) 
 ``` 

 (see https://http.dev/598) 

 ## Suggestions 
 * Look for accordingly affected jobs, e.g. using openqa-label-known-issues or look into the database and retrigger accordingly 
 * Due to the timely coincidence this is very related to #163592 but assets should be served by NGINX directly (so the unresponsive Mojo web app should not have an impact) 
 * Verify that those asset downloads work independently from the web app (e.g. by stopping the web app shortly and try to download the asset) 
     * Maybe the web app is still involved for some redirection? That would be fine but of course means it is related to #163592. 
     * Can we split that completely? 
 * Check how this timeout is handled by NGINX 


 ## Rollback actions 
 * Set pipeline status emails in https://gitlab.suse.de/openqa/scripts-ci/-/settings/integrations/pipelines_email/edit to "Active"

Back