Project

General

Profile

action #41222

A lot of occurences of 'websocket connection closed'

Added by EDiGiacinto almost 3 years ago. Updated 12 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2018-09-19
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Apparently we are killing worker connections somehow, in just two days there is a huge amount of connection closed.

On osd:

grep -r 'websocket connection closed' /var/log/openqa                     
[2018-09-18T02:50:06.0830 CEST] [info] Worker 251 websocket connection closed - 1005 
[2018-09-18T02:53:48.0273 CEST] [info] Worker 896 websocket connection closed - 1005 
[2018-09-18T02:59:53.0594 CEST] [info] Worker 968 websocket connection closed - 1005 
[2018-09-18T03:13:19.0889 CEST] [info] Worker 969 websocket connection closed - 1006 
[2018-09-18T04:27:33.0469 CEST] [info] Worker 895 websocket connection closed - 1006 
[2018-09-18T04:29:24.0790 CEST] [info] Worker 959 websocket connection closed - 1006 
[2018-09-18T04:29:45.0718 CEST] [info] Worker 649 websocket connection closed - 1006 
[2018-09-18T04:30:11.0060 CEST] [info] Worker 969 websocket connection closed - 1006 
[2018-09-18T04:30:11.0898 CEST] [info] Worker 1119 websocket connection closed - 1006
[2018-09-18T04:30:15.0991 CEST] [info] Worker 406 websocket connection closed - 1006 

grep -r 'websocket connection closed' /var/log/openqa | wc -l
708

To note, it's not when we are closing the connection when worker is thought dead (that should give 1008 as code) https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/WebSockets/Server.pm#L202


Related issues

Blocked by openQA Project - action #41027: worker disconnects during cleanupResolved2018-09-14

History

#1 Updated by coolo almost 3 years ago

Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.

And then there is https://progress.opensuse.org/issues/41027

#2 Updated by EDiGiacinto almost 3 years ago

coolo wrote:

Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.

And then there is https://progress.opensuse.org/issues/41027

Yup, but if you look to the logs, it is happening for basically the whole two days, here in those logs you see also a different reason for disconnection (1005)

#3 Updated by coolo almost 3 years ago

  • Blocked by action #41027: worker disconnects during cleanup added

#4 Updated by coolo almost 3 years ago

  • Target version set to future

As I suspect 41027 to be responsible for the majority, I set this to 'future'. Once the other issues is done, we should reevaluate the frequency

#5 Updated by okurz about 2 years ago

  • Category set to Concrete Bugs

#6 Updated by okurz 12 months ago

  • Status changed from New to Resolved
  • Assignee set to okurz

The log message string "websocket connection closed" still exists in our source code but we do not see these messages a lot (if at all), e.g. see sudo journalctl -u openqa-websockets | grep -c 'websocket connection closed' returns no match for a 24h period when tests had been running.

Also available in: Atom PDF