Project

General

Profile

Actions

action #19424

closed

[tools] logwarn: [websockets:error] Worker not found for given connection during connection close

Added by mgriessmeier over 7 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2017-05-30
Due date:
% Done:

0%

Estimated time:

Description

Following message appears quite often lately in the logs:

[Tue May 30 06:10:09 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:10:41 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:15:14 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:15:46 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:20:19 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:20:51 2017] [websockets:error] Worker not found for given connection during connection close
[...]

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #21836: [tools][sprint 201709.1] Many "A message received from unknown worker connection" log entries on openqa.suse.deResolvedEDiGiacinto2017-08-08

Actions
Actions #1

Updated by okurz over 7 years ago

  • Status changed from New to In Progress

This was persisting until yesterday evening. The last line in the logfile I can see for now

[Fri Jun  2 23:32:27 2017] [websockets:error] Worker not found for given connection during connection close

I looked in the source code itself but could not find a good way to improve the error message to hint to a certain problem or worker.

Actions #2

Updated by okurz over 7 years ago

monitoring alert disabled with https://github.com/okurz/openqa_monitoring/pull/10 so don't get confused when you don't get an email anymore.

Actions #3

Updated by nicksinger over 7 years ago

  • Related to action #21836: [tools][sprint 201709.1] Many "A message received from unknown worker connection" log entries on openqa.suse.de added
Actions #4

Updated by coolo about 7 years ago

  • Priority changed from Normal to High
  • Target version set to Ready

This is still going on

Actions #5

Updated by szarate about 7 years ago

  • Priority changed from High to Normal
Actions #6

Updated by mkittler almost 6 years ago

Is this still happening? I couldn't see anything in the recent websocket server logs on OSD. I'm not sure why this problem would occur occasionally but it should be easy to rewrite the code to get rid of it (using the same pattern as in the developer mode code).

Actions #7

Updated by okurz over 5 years ago

  • Category changed from 132 to Feature requests
Actions #8

Updated by okurz over 5 years ago

  • Status changed from In Progress to Workable

I just checked the logs on o3 with sudo grep 'Worker not found for given connection' /var/log/openqa and found a lot of these messages still.

Actions #9

Updated by mkittler over 5 years ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler

Since I'm looking at the web socket server code anyways right now I'll have a look.

Actions #11

Updated by okurz over 5 years ago

PR merged. I am not sure if the PR should be the only thing needed to resolve this ticket though.

Actions #12

Updated by mkittler over 5 years ago

  • Status changed from In Progress to Feedback

Me neither. I could only reproduce this issue by configuring a worker to connect to the same web UI twice at the same time (which is unlikely to happen in production). So let's see whether this fixes the production case as well (which is - if my theory is correct - that the worker already tries to reconnect while the web socket server hasn't handled the previous disconnect yet).

Actions #13

Updated by okurz over 5 years ago

mkittler wrote:

I could only reproduce this issue by configuring a worker to connect to the same web UI twice at the same time (which is unlikely to happen in production)

Can an admin by mistake really configure the worker in this way or rather change the code?

Actions #14

Updated by mkittler over 5 years ago

If you add a worker host twice in the config the worker will connect twice as if they were different hosts. No de-duplication happens. (I guess this was also the case before my worker restructuring. At least there was no explicit code for de-duplication.)

I could of course change it so the hostname/URL would be de-duplicated at least on string-level.

Actions #15

Updated by okurz over 5 years ago

I would rather die the worker hard on this configuration error, don't try to be too smart in code when the admin messes up :)

Actions #16

Updated by mkittler over 5 years ago

  • Status changed from Feedback to Resolved

I've just had a brief look at the recent OSD logs and it is not happening anymore. I'd say making the worker fail in this case is a different issue.

Actions #17

Updated by coolo about 5 years ago

  • Target version changed from Ready to Done
Actions

Also available in: Atom PDF