action #164709
closed
OpenQA logreport for o3 logging livestream connection refused errors
Added by livdywan 4 months ago.
Updated 4 months ago.
Category:
Regressions/Crashes
Description
Observation¶
OpenQA logreport for ariel.suse-dmz.opensuse.org is reporting the following errors:
[2024-07-30T15:07:24.974976Z] [error] [pid:14909] Unable to ask worker 759 to start providing livestream for 4368112: Connection refused
And they come in pairs for the same "worker".
Acceptance criteria¶
- AC1: No errors about live stream connections being refused are reported
Suggestions¶
- Investigate what happens when such errors are logged, and if the liveview still works or jobs fail - no user reports of such issues are known at least
- Category set to Regressions/Crashes
- Status changed from New to In Progress
- Assignee set to mkittler
Investigate what happens when such errors are logged, and if the liveview still works or jobs fail - no user reports of such issues are known at least
It means the websocket server (the service running via the systemd unit openqa-websockets.service
) is down. So this is a web-UI-local connection being refused (and not coming from the worker). No jobs fail but the live view will not show anything (until the "Live" tab is reentered under better conditions). So the impact is very low.
The websockets server was just restarted:
martchus@ariel:~> sudo journalctl --since '2024-07-30 15:00:00' -fu openqa-websockets.service
Jul 30 15:07:22 ariel systemd[1]: Stopping The openQA WebSockets server...
Jul 30 15:07:22 ariel openqa-websockets-daemon[775]: Web application available at http://127.0.0.1:9527
Jul 30 15:07:22 ariel openqa-websockets-daemon[775]: Web application available at http://[::1]:9527
Jul 30 15:07:22 ariel systemd[1]: openqa-websockets.service: Deactivated successfully.
Jul 30 15:07:22 ariel systemd[1]: Stopped The openQA WebSockets server.
… The request happened between those log lines. …
Jul 30 15:07:26 ariel systemd[1]: Started The openQA WebSockets server.
Jul 30 16:55:57 ariel systemd[1]: Stopping The openQA WebSockets server...
Jul 30 16:55:57 ariel openqa-websockets-daemon[14928]: Web application available at http://127.0.0.1:9527
Jul 30 16:55:57 ariel openqa-websockets-daemon[14928]: Web application available at http://[::1]:9527
Jul 30 16:55:58 ariel systemd[1]: openqa-websockets.service: Deactivated successfully.
I'm wondering whether I should add a retry here. Considering the low impact it makes also most sense to make this just a warning, too.
- Status changed from In Progress to Feedback
- Priority changed from High to Normal
- Status changed from Feedback to Resolved
The PR has been merged and deployed. With this now just being a debug message logwarn shouldn't complain about it anymore.
I would not rephrase the message because explaining the impact would make it too long and our logs are noisy enough. The log level should speak for itself. (And as a debug message it can still help while debugging the live mode but then one is probably close enough to the code for not needing any further explanations.)
Also available in: Atom
PDF