action #160877
Updated by livdywan 6 months ago
## Observation
We have a case where https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2649768 fails due to:
```
Job state of job ID 14429107: scheduled, waiting … (delay: 10; waited 70s)
{"blocked_by_id":null,"id":14429107,"result":"none","state":"scheduled"}
Job state of job ID 14429107: scheduled, waiting … (delay: 10; waited 80s)
Request failed, hit error 502, retrying up to 60 more times after waiting … (delay: 5; waited 0s)
...
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.21.5</center>
</body>
</html>
```
This also happened again: https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2655477
Did we managed do DoS the server? Do we need to tweak the nginx even more?
## Suggestions
* We're already retrying 60 times as is visible in the logs - more retries probably won't help
* Maybe this could be a bug in `openqa-cli ... --monitor`
* How come we didn't see issues elsewhere?
* Seems to happen roughly around the some time e.g. around 8 in the morning
* Unsilence [web UI: Too many 5xx HTTP responses alert](https://monitor.qa.suse.de/alerting/grafana/d949dbae-8034-4bf4-8418-f148dfcaf89d/view?orgId=1)