Project

General

Profile

action #159792

Updated by livdywan 8 months ago

## Observation 
 https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1&from=1714042970812&to=1714056541493 
 shows an alert condition (dashed red line) 

 https://mailman.suse.de/mlarch/SuSE/osd-admins/2024/osd-admins.2024.04/msg00148.html 
 is the corresponding alert which bundles two alerts and only the less significant one was commented on. We should still look into the 5xx HTTP response alert problemhttps://progress.opensuse.org/issues/159639 

 Affected routes include: 

 ``` 
 /api/v1/ws/3410 
 ``` 

 ``` 
 /liveviewhandler/tests/14146684/developer/ws-proxy/status 
 ``` 

 As there doesn't seem to be a proper error in the log, it should be made sure, that we get a usable error message indicating the error, file and line number where the error was raised. 

 ## Acceptance Criteria 
 * **AC1**: If a 500 error is logged by the reverse-proxy there is also a corresponding log message in the underlying service logs. 

 ## Suggestions 

 * Confirm how this is only happening on certain workers (see #159639#note-8) 
   * Maybe those machines are outdated? Consider updating the system 
   * 10.161.50.3 
   * 10.100.2.148 
 * Extend error handling and logging on likely relevant routes 
   * api/v1/ws 
   * liveviewhandler/tests/.../developer/ws-proxy/status 
   * Also keep in mind the openqa-websockets service (the worker connects to) 
 * Maybe the connection was lost (so the gateway could be reached and the websocket connection established but at some point the connection was lost) and Mojolicious doesn't show a good error message in that case 
 * Maybe this goes away after switching to NGINX (and also implementing the kind of monitoring for NGINX)

Back