action #41222

A lot of occurences of 'websocket connection closed'

Added by EDiGiacinto over 1 year ago. Updated 8 months ago.

Status:NewStart date:19/09/2018
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Concrete Bugs
Target version:QA - future
Difficulty:
Duration:

Description

Apparently we are killing worker connections somehow, in just two days there is a huge amount of connection closed.

On osd:

grep -r 'websocket connection closed' /var/log/openqa                     
[2018-09-18T02:50:06.0830 CEST] [info] Worker 251 websocket connection closed - 1005 
[2018-09-18T02:53:48.0273 CEST] [info] Worker 896 websocket connection closed - 1005 
[2018-09-18T02:59:53.0594 CEST] [info] Worker 968 websocket connection closed - 1005 
[2018-09-18T03:13:19.0889 CEST] [info] Worker 969 websocket connection closed - 1006 
[2018-09-18T04:27:33.0469 CEST] [info] Worker 895 websocket connection closed - 1006 
[2018-09-18T04:29:24.0790 CEST] [info] Worker 959 websocket connection closed - 1006 
[2018-09-18T04:29:45.0718 CEST] [info] Worker 649 websocket connection closed - 1006 
[2018-09-18T04:30:11.0060 CEST] [info] Worker 969 websocket connection closed - 1006 
[2018-09-18T04:30:11.0898 CEST] [info] Worker 1119 websocket connection closed - 1006
[2018-09-18T04:30:15.0991 CEST] [info] Worker 406 websocket connection closed - 1006 

grep -r 'websocket connection closed' /var/log/openqa | wc -l
708

To note, it's not when we are closing the connection when worker is thought dead (that should give 1008 as code) https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/WebSockets/Server.pm#L202


Related issues

Blocked by openQA Project - action #41027: worker disconnects during cleanup Resolved 14/09/2018

History

#1 Updated by coolo over 1 year ago

Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.

And then there is https://progress.opensuse.org/issues/41027

#2 Updated by EDiGiacinto over 1 year ago

coolo wrote:

Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.


And then there is https://progress.opensuse.org/issues/41027

Yup, but if you look to the logs, it is happening for basically the whole two days, here in those logs you see also a different reason for disconnection (1005)

#3 Updated by coolo over 1 year ago

  • Blocked by action #41027: worker disconnects during cleanup added

#4 Updated by coolo over 1 year ago

  • Target version set to future

As I suspect 41027 to be responsible for the majority, I set this to 'future'. Once the other issues is done, we should reevaluate the frequency

#5 Updated by okurz 8 months ago

  • Category set to Concrete Bugs

Also available in: Atom PDF