action #159792
Updated by okurz 6 months ago
## Observation
https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1&from=1714042970812&to=1714056541493
shows an alert condition (dashed red line)
https://mailman.suse.de/mlarch/SuSE/osd-admins/2024/osd-admins.2024.04/msg00148.html
is the corresponding alert which bundles two alerts and only the less significant one was commented on. We should still look into the 5xx HTTP response alert problem #159639 problemhttps://progress.opensuse.org/issues/159639
Affected routes include:
```
/api/v1/ws/3410
```
```
/liveviewhandler/tests/14146684/developer/ws-proxy/status
```
As there doesn't seem to be a proper error in the log, it should be made sure, that we get a usable error message indicating the error, file and line number where the error was raised.
## Acceptance Criteria
* **AC1**: If a 500 error is logged by the reverse-proxy there is also a corresponding log message in the underlying service logs.
## Suggestions
* Confirm how this is only happening on certain workers (see #159639#note-8)
* Maybe those machines are outdated? Consider updating the system
* 10.161.50.3
* 10.100.2.148
* Extend error handling and logging on likely relevant routes
* api/v1/ws
* liveviewhandler/tests/.../developer/ws-proxy/status
* Also keep in mind the openqa-websockets service (the worker connects to)
* Maybe the connection was lost (so the gateway could be reached and the websocket connection established but at some point the connection was lost) and Mojolicious doesn't show a good error message in that case
* Maybe this goes away after switching to NGINX (and also implementing the kind of monitoring for NGINX)
Back