Project

General

Profile

action #160877

Updated by livdywan 28 days ago

## Observation 

 We have a case where https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2649768 fails due to: 

 ``` 
 Job state of job ID 14429107: scheduled, waiting … (delay: 10; waited 70s) 
 {"blocked_by_id":null,"id":14429107,"result":"none","state":"scheduled"} 
 Job state of job ID 14429107: scheduled, waiting … (delay: 10; waited 80s) 
 Request failed, hit error 502, retrying up to 60 more times after waiting … (delay: 5; waited 0s) 
 ... 
 <html> 
 <head><title>502 Bad Gateway</title></head> 
 <body> 
 <center><h1>502 Bad Gateway</h1></center> 
 <hr><center>nginx/1.21.5</center> 
 </body> 
 </html> 
 ``` 

 This also happened again: https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2655477 

 Did we managed do DoS the server? Do we need to tweak the nginx even more? 


 ## Suggestions 

 * We're already retrying 60 times as is visible in the logs - more retries probably won't help 
 * Maybe this could be a bug in `openqa-cli ... --monitor` 
 * How come we didn't see issues elsewhere? 
 * Seems to happen roughly around the some time e.g. around 8 in the morning 
 * Unsilence [web UI: Too many 5xx HTTP responses alert](https://monitor.qa.suse.de/alerting/grafana/d949dbae-8034-4bf4-8418-f148dfcaf89d/view?orgId=1)

Back