action #41222
closedA lot of occurences of 'websocket connection closed'
0%
Description
Apparently we are killing worker connections somehow, in just two days there is a huge amount of connection closed.
On osd:
grep -r 'websocket connection closed' /var/log/openqa
[2018-09-18T02:50:06.0830 CEST] [info] Worker 251 websocket connection closed - 1005
[2018-09-18T02:53:48.0273 CEST] [info] Worker 896 websocket connection closed - 1005
[2018-09-18T02:59:53.0594 CEST] [info] Worker 968 websocket connection closed - 1005
[2018-09-18T03:13:19.0889 CEST] [info] Worker 969 websocket connection closed - 1006
[2018-09-18T04:27:33.0469 CEST] [info] Worker 895 websocket connection closed - 1006
[2018-09-18T04:29:24.0790 CEST] [info] Worker 959 websocket connection closed - 1006
[2018-09-18T04:29:45.0718 CEST] [info] Worker 649 websocket connection closed - 1006
[2018-09-18T04:30:11.0060 CEST] [info] Worker 969 websocket connection closed - 1006
[2018-09-18T04:30:11.0898 CEST] [info] Worker 1119 websocket connection closed - 1006
[2018-09-18T04:30:15.0991 CEST] [info] Worker 406 websocket connection closed - 1006
grep -r 'websocket connection closed' /var/log/openqa | wc -l
708
To note, it's not when we are closing the connection when worker is thought dead (that should give 1008 as code) https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/WebSockets/Server.pm#L202
Updated by coolo about 6 years ago
Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.
And then there is https://progress.opensuse.org/issues/41027
Updated by EDiGiacinto about 6 years ago
coolo wrote:
Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.
And then there is https://progress.opensuse.org/issues/41027
Yup, but if you look to the logs, it is happening for basically the whole two days, here in those logs you see also a different reason for disconnection (1005)
Updated by coolo about 6 years ago
- Blocked by action #41027: worker disconnects during cleanup added
Updated by coolo about 6 years ago
- Target version set to future
As I suspect 41027 to be responsible for the majority, I set this to 'future'. Once the other issues is done, we should reevaluate the frequency
Updated by okurz over 4 years ago
- Status changed from New to Resolved
- Assignee set to okurz
The log message string "websocket connection closed" still exists in our source code but we do not see these messages a lot (if at all), e.g. see sudo journalctl -u openqa-websockets | grep -c 'websocket connection closed'
returns no match for a 24h period when tests had been running.