action #159792
Updated by okurz about 1 month ago
## Observation https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1&from=1714042970812&to=1714056541493 shows an alert condition (dashed red line) https://mailman.suse.de/mlarch/SuSE/osd-admins/2024/osd-admins.2024.04/msg00148.html is the corresponding alert which bundles two alerts and only the less significant one was commented on. We should still look into the 5xx HTTP response alert problem #159639 problemhttps://progress.opensuse.org/issues/159639 Affected routes include: ``` /api/v1/ws/3410 ``` ``` /liveviewhandler/tests/14146684/developer/ws-proxy/status ``` As there doesn't seem to be a proper error in the log, it should be made sure, that we get a usable error message indicating the error, file and line number where the error was raised. ## Acceptance Criteria * **AC1**: If a 500 error is logged by the reverse-proxy there is also a corresponding log message in the underlying service logs. ## Suggestions * Confirm how this is only happening on certain workers (see #159639#note-8) * Maybe those machines are outdated? Consider updating the system * 10.161.50.3 * 10.100.2.148 * Extend error handling and logging on likely relevant routes * api/v1/ws * liveviewhandler/tests/.../developer/ws-proxy/status * Also keep in mind the openqa-websockets service (the worker connects to) * Maybe the connection was lost (so the gateway could be reached and the websocket connection established but at some point the connection was lost) and Mojolicious doesn't show a good error message in that case * Maybe this goes away after switching to NGINX (and also implementing the kind of monitoring for NGINX)