Project

General

Profile

Actions

action #156625

closed

[alert] Scripts CI pipeline failing due to osd yielding 503 - take 2 size:M

Added by okurz 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

After #156052 we still have a case https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2344558 like this:

Job state of job ID 13715326: running, waiting …
{"blocked_by_id":null,"id":13715326,"result":"none","state":"running"}
Job state of job ID 13715326: running, waiting …
Request failed, hit error 503, retrying up to 60 more times after waiting …
…
Request failed, hit error 503, retrying up to 1 more times after waiting …
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Unavailable</title>
</head><body>
<h1>Service Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
<p>Additionally, a 503 Service Unavailable
error was encountered while trying to use an ErrorDocument to handle the request.</p>
<hr>
<address>Apache/2.4.51 (Linux/SUSE) Server at openqa.suse.de Port 80</address>
</body></html>

that's possibly a retry over multiple minutes but still something is off here.

Acceptance criteria

  • AC1: (vague) openqa-cli waits sufficiently long to cover usual OSD outages
  • AC2: The retry-functionality in openqa-cli was double-verified and works as intended

Suggestions

  • Test the openqa-cli behaviour maybe together with an apache proxy on a local installation
  • Check if the retry actually properly sleeps in between
  • Consider adding exponential backup into openqa-cli, see https://github.com/okurz/retry/blob/main/retry#L49
  • Consider adding a timestamp to the gitlab CI pipeline output
  • Consider output the value of OPENQA_CLI_RETRY_SLEEP_TIME_S in the Request failed, hit error ..., retrying up to ... more times after waiting line

Related issues 1 (0 open1 closed)

Copied from openQA Project - action #156052: [alert] Scripts CI pipeline failing after logging multiple Job state of job ID 13603796: running, waiting size:SResolvedmkittler2024-02-262024-03-13

Actions
Actions

Also available in: Atom PDF