Project

General

Profile

Actions

action #159792

closed

Add better logging for 500 errors on websocket routes size:M

Added by dheidler 2 months ago. Updated 19 days ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-04-26
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1&from=1714042970812&to=1714056541493
shows an alert condition (dashed red line)

https://mailman.suse.de/mlarch/SuSE/osd-admins/2024/osd-admins.2024.04/msg00148.html
is the corresponding alert which bundles two alerts and only the less significant one was commented on. We should still look into the 5xx HTTP response alert problem #159639

Affected routes include:

/api/v1/ws/3410
/liveviewhandler/tests/14146684/developer/ws-proxy/status

As there doesn't seem to be a proper error in the log, it should be made sure, that we get a usable error message indicating the error, file and line number where the error was raised.

Acceptance Criteria

  • AC1: If a 500 error is logged by the reverse-proxy there is also a corresponding log message in the underlying service logs.

Suggestions

  • Confirm how this is only happening on certain workers (see #159639#note-8)
    • Maybe those machines are outdated? Consider updating the system
    • 10.161.50.3
    • 10.100.2.148
  • Extend error handling and logging on likely relevant routes
    • api/v1/ws
    • liveviewhandler/tests/.../developer/ws-proxy/status
    • Also keep in mind the openqa-websockets service (the worker connects to)
  • Maybe the connection was lost (so the gateway could be reached and the websocket connection established but at some point the connection was lost) and Mojolicious doesn't show a good error message in that case
  • Maybe this goes away after switching to NGINX (and also implementing the kind of monitoring for NGINX)

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #159639: [alert] "web UI: Too many 5xx HTTP responses alert" size:SResolveddheidler2024-04-26

Actions
Actions

Also available in: Atom PDF