action #164709
closedOpenQA logreport for o3 logging livestream connection refused errors
0%
Description
Observation¶
OpenQA logreport for ariel.suse-dmz.opensuse.org is reporting the following errors:
[2024-07-30T15:07:24.974976Z] [error] [pid:14909] Unable to ask worker 759 to start providing livestream for 4368112: Connection refused
And they come in pairs for the same "worker".
Acceptance criteria¶
- AC1: No errors about live stream connections being refused are reported
Suggestions¶
- Investigate what happens when such errors are logged, and if the liveview still works or jobs fail - no user reports of such issues are known at least
Updated by mkittler 4 months ago
Investigate what happens when such errors are logged, and if the liveview still works or jobs fail - no user reports of such issues are known at least
It means the websocket server (the service running via the systemd unit openqa-websockets.service
) is down. So this is a web-UI-local connection being refused (and not coming from the worker). No jobs fail but the live view will not show anything (until the "Live" tab is reentered under better conditions). So the impact is very low.
Updated by mkittler 4 months ago
The websockets server was just restarted:
martchus@ariel:~> sudo journalctl --since '2024-07-30 15:00:00' -fu openqa-websockets.service
Jul 30 15:07:22 ariel systemd[1]: Stopping The openQA WebSockets server...
Jul 30 15:07:22 ariel openqa-websockets-daemon[775]: Web application available at http://127.0.0.1:9527
Jul 30 15:07:22 ariel openqa-websockets-daemon[775]: Web application available at http://[::1]:9527
Jul 30 15:07:22 ariel systemd[1]: openqa-websockets.service: Deactivated successfully.
Jul 30 15:07:22 ariel systemd[1]: Stopped The openQA WebSockets server.
… The request happened between those log lines. …
Jul 30 15:07:26 ariel systemd[1]: Started The openQA WebSockets server.
Jul 30 16:55:57 ariel systemd[1]: Stopping The openQA WebSockets server...
Jul 30 16:55:57 ariel openqa-websockets-daemon[14928]: Web application available at http://127.0.0.1:9527
Jul 30 16:55:57 ariel openqa-websockets-daemon[14928]: Web application available at http://[::1]:9527
Jul 30 16:55:58 ariel systemd[1]: openqa-websockets.service: Deactivated successfully.
I'm wondering whether I should add a retry here. Considering the low impact it makes also most sense to make this just a warning, too.
Updated by mkittler 4 months ago
- Status changed from Feedback to Resolved
The PR has been merged and deployed. With this now just being a debug message logwarn shouldn't complain about it anymore.
I would not rephrase the message because explaining the impact would make it too long and our logs are noisy enough. The log level should speak for itself. (And as a debug message it can still help while debugging the live mode but then one is probably close enough to the code for not needing any further explanations.)