Project

General

Profile

action #156625

Updated by okurz 2 months ago

## Observation 
 After #156052 we still have a case https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2344558 like this: 


 ``` 
 Job state of job ID 13715326: running, waiting … 
 {"blocked_by_id":null,"id":13715326,"result":"none","state":"running"} 
 Job state of job ID 13715326: running, waiting … 
 Request failed, hit error 503, retrying up to 60 more times after waiting … 
 … 
 Request failed, hit error 503, retrying up to 1 more times after waiting … 
 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
 <html><head> 
 <title>503 Service Unavailable</title> 
 </head><body> 
 <h1>Service Unavailable</h1> 
 <p>The server is temporarily unable to service your 
 request due to maintenance downtime or capacity 
 problems. Please try again later.</p> 
 <p>Additionally, a 503 Service Unavailable 
 error was encountered while trying to use an ErrorDocument to handle the request.</p> 
 <hr> 
 <address>Apache/2.4.51 (Linux/SUSE) Server at openqa.suse.de Port 80</address> 
 </body></html> 
 ``` 

 that's possibly a retry over multiple minutes but still something is off here. 

 ## Acceptance criteria 
 * **AC1:** (vague) openqa-cli waits sufficiently long to cover usual OSD outages 
 * **AC2:** The retry-functionality in openqa-cli was double-verified and works as intended 

 ## Suggestions 
 * Test the openqa-cli behaviour maybe together with an apache proxy on a local installation 
 * Check if the retry actually properly sleeps in between 
 * Consider adding exponential backup into openqa-cli, see https://github.com/okurz/retry/blob/main/retry#L49 
 * Consider adding a timestamp to the gitlab CI pipeline output 
 * Consider output the value of `OPENQA_CLI_RETRY_SLEEP_TIME_S` in the `Request failed, hit error ..., retrying up to ... more times after waiting` line

Back