Actions
action #159792
closedAdd better logging for 500 errors on websocket routes size:M
Start date:
2024-04-26
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1&from=1714042970812&to=1714056541493
shows an alert condition (dashed red line)
https://mailman.suse.de/mlarch/SuSE/osd-admins/2024/osd-admins.2024.04/msg00148.html
is the corresponding alert which bundles two alerts and only the less significant one was commented on. We should still look into the 5xx HTTP response alert problem #159639
Affected routes include:
/api/v1/ws/3410
/liveviewhandler/tests/14146684/developer/ws-proxy/status
As there doesn't seem to be a proper error in the log, it should be made sure, that we get a usable error message indicating the error, file and line number where the error was raised.
Acceptance Criteria¶
- AC1: If a 500 error is logged by the reverse-proxy there is also a corresponding log message in the underlying service logs.
Suggestions¶
- Confirm how this is only happening on certain workers (see #159639#note-8)
- Maybe those machines are outdated? Consider updating the system
- 10.161.50.3
- 10.100.2.148
- Extend error handling and logging on likely relevant routes
- api/v1/ws
- liveviewhandler/tests/.../developer/ws-proxy/status
- Also keep in mind the openqa-websockets service (the worker connects to)
- Maybe the connection was lost (so the gateway could be reached and the websocket connection established but at some point the connection was lost) and Mojolicious doesn't show a good error message in that case
- Maybe this goes away after switching to NGINX (and also implementing the kind of monitoring for NGINX)
Actions