Project

General

Profile

Actions

action #41222

closed

A lot of occurences of 'websocket connection closed'

Added by EDiGiacinto over 6 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2018-09-19
Due date:
% Done:

0%

Estimated time:

Description

Apparently we are killing worker connections somehow, in just two days there is a huge amount of connection closed.

On osd:

grep -r 'websocket connection closed' /var/log/openqa                     
[2018-09-18T02:50:06.0830 CEST] [info] Worker 251 websocket connection closed - 1005 
[2018-09-18T02:53:48.0273 CEST] [info] Worker 896 websocket connection closed - 1005 
[2018-09-18T02:59:53.0594 CEST] [info] Worker 968 websocket connection closed - 1005 
[2018-09-18T03:13:19.0889 CEST] [info] Worker 969 websocket connection closed - 1006 
[2018-09-18T04:27:33.0469 CEST] [info] Worker 895 websocket connection closed - 1006 
[2018-09-18T04:29:24.0790 CEST] [info] Worker 959 websocket connection closed - 1006 
[2018-09-18T04:29:45.0718 CEST] [info] Worker 649 websocket connection closed - 1006 
[2018-09-18T04:30:11.0060 CEST] [info] Worker 969 websocket connection closed - 1006 
[2018-09-18T04:30:11.0898 CEST] [info] Worker 1119 websocket connection closed - 1006
[2018-09-18T04:30:15.0991 CEST] [info] Worker 406 websocket connection closed - 1006 

grep -r 'websocket connection closed' /var/log/openqa | wc -l
708

To note, it's not when we are closing the connection when worker is thought dead (that should give 1008 as code) https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/WebSockets/Server.pm#L202


Related issues 1 (0 open1 closed)

Blocked by openQA Project (public) - action #41027: worker disconnects during cleanupResolvedkraih2018-09-14

Actions
Actions #1

Updated by coolo over 6 years ago

Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.

And then there is https://progress.opensuse.org/issues/41027

Actions #2

Updated by EDiGiacinto over 6 years ago

coolo wrote:

Note that a worker deployment (worker restart) looks the same. And with 200 workers you get the number of disconnects high quite easily.

And then there is https://progress.opensuse.org/issues/41027

Yup, but if you look to the logs, it is happening for basically the whole two days, here in those logs you see also a different reason for disconnection (1005)

Actions #3

Updated by coolo over 6 years ago

  • Blocked by action #41027: worker disconnects during cleanup added
Actions #4

Updated by coolo over 6 years ago

  • Target version set to future

As I suspect 41027 to be responsible for the majority, I set this to 'future'. Once the other issues is done, we should reevaluate the frequency

Actions #5

Updated by okurz over 5 years ago

  • Category set to Regressions/Crashes
Actions #6

Updated by okurz over 4 years ago

  • Status changed from New to Resolved
  • Assignee set to okurz

The log message string "websocket connection closed" still exists in our source code but we do not see these messages a lot (if at all), e.g. see sudo journalctl -u openqa-websockets | grep -c 'websocket connection closed' returns no match for a 24h period when tests had been running.

Actions

Also available in: Atom PDF