Project

General

Profile

Actions

action #164709

closed

OpenQA logreport for o3 logging livestream connection refused errors

Added by livdywan 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

OpenQA logreport for ariel.suse-dmz.opensuse.org is reporting the following errors:

[2024-07-30T15:07:24.974976Z] [error] [pid:14909] Unable to ask worker 759 to start providing livestream for 4368112: Connection refused

And they come in pairs for the same "worker".

Acceptance criteria

  • AC1: No errors about live stream connections being refused are reported

Suggestions

  • Investigate what happens when such errors are logged, and if the liveview still works or jobs fail - no user reports of such issues are known at least
Actions #1

Updated by okurz 5 months ago

  • Category set to Regressions/Crashes
Actions #2

Updated by mkittler 5 months ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #3

Updated by mkittler 5 months ago

Investigate what happens when such errors are logged, and if the liveview still works or jobs fail - no user reports of such issues are known at least

It means the websocket server (the service running via the systemd unit openqa-websockets.service) is down. So this is a web-UI-local connection being refused (and not coming from the worker). No jobs fail but the live view will not show anything (until the "Live" tab is reentered under better conditions). So the impact is very low.

Actions #4

Updated by mkittler 5 months ago

The websockets server was just restarted:

martchus@ariel:~> sudo journalctl --since '2024-07-30 15:00:00' -fu openqa-websockets.service 
Jul 30 15:07:22 ariel systemd[1]: Stopping The openQA WebSockets server...
Jul 30 15:07:22 ariel openqa-websockets-daemon[775]: Web application available at http://127.0.0.1:9527
Jul 30 15:07:22 ariel openqa-websockets-daemon[775]: Web application available at http://[::1]:9527
Jul 30 15:07:22 ariel systemd[1]: openqa-websockets.service: Deactivated successfully.
Jul 30 15:07:22 ariel systemd[1]: Stopped The openQA WebSockets server.
… The request happened between those log lines. …
Jul 30 15:07:26 ariel systemd[1]: Started The openQA WebSockets server.
Jul 30 16:55:57 ariel systemd[1]: Stopping The openQA WebSockets server...
Jul 30 16:55:57 ariel openqa-websockets-daemon[14928]: Web application available at http://127.0.0.1:9527
Jul 30 16:55:57 ariel openqa-websockets-daemon[14928]: Web application available at http://[::1]:9527
Jul 30 16:55:58 ariel systemd[1]: openqa-websockets.service: Deactivated successfully.

I'm wondering whether I should add a retry here. Considering the low impact it makes also most sense to make this just a warning, too.

Actions #6

Updated by mkittler 5 months ago

  • Status changed from In Progress to Feedback
  • Priority changed from High to Normal
Actions #7

Updated by mkittler 5 months ago

  • Status changed from Feedback to Resolved

The PR has been merged and deployed. With this now just being a debug message logwarn shouldn't complain about it anymore.

I would not rephrase the message because explaining the impact would make it too long and our logs are noisy enough. The log level should speak for itself. (And as a debug message it can still help while debugging the live mode but then one is probably close enough to the code for not needing any further explanations.)

Actions

Also available in: Atom PDF