action #156625
Updated by okurz about 1 year ago
## Observation
After #156052 we still have a case https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2344558 like this:
```
Job state of job ID 13715326: running, waiting …
{"blocked_by_id":null,"id":13715326,"result":"none","state":"running"}
Job state of job ID 13715326: running, waiting …
Request failed, hit error 503, retrying up to 60 more times after waiting …
…
Request failed, hit error 503, retrying up to 1 more times after waiting …
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Unavailable</title>
</head><body>
<h1>Service Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
<p>Additionally, a 503 Service Unavailable
error was encountered while trying to use an ErrorDocument to handle the request.</p>
<hr>
<address>Apache/2.4.51 (Linux/SUSE) Server at openqa.suse.de Port 80</address>
</body></html>
```
that's possibly a retry over multiple minutes but still something is off here.
## Acceptance criteria
* **AC1:** (vague) openqa-cli waits sufficiently long to cover usual OSD outages
* **AC2:** The retry-functionality in openqa-cli was double-verified and works as intended
## Suggestions
* Test the openqa-cli behaviour maybe together with an apache proxy on a local installation
* Check if the retry actually properly sleeps in between
* Consider adding exponential backup into openqa-cli, see https://github.com/okurz/retry/blob/main/retry#L49
* Consider adding a timestamp to the gitlab CI pipeline output
* Consider output the value of `OPENQA_CLI_RETRY_SLEEP_TIME_S` in the `Request failed, hit error ..., retrying up to ... more times after waiting` line