action #19424

[tools] logwarn: [websockets:error] Worker not found for given connection during connection close

Added by mgriessmeier over 2 years ago. Updated 5 months ago.

Status:ResolvedStart date:30/05/2017
Priority:NormalDue date:
Assignee:mkittler% Done:

0%

Category:Feature requests
Target version:Done
Difficulty:
Duration:

Description

Following message appears quite often lately in the logs:

[Tue May 30 06:10:09 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:10:41 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:15:14 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:15:46 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:20:19 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:20:51 2017] [websockets:error] Worker not found for given connection during connection close
[...]

Related issues

Related to openQA Project - action #21836: [tools][sprint 201709.1] Many "A message received from un... Resolved 08/08/2017

History

#1 Updated by okurz over 2 years ago

  • Status changed from New to In Progress

This was persisting until yesterday evening. The last line in the logfile I can see for now

[Fri Jun  2 23:32:27 2017] [websockets:error] Worker not found for given connection during connection close

I looked in the source code itself but could not find a good way to improve the error message to hint to a certain problem or worker.

#2 Updated by okurz over 2 years ago

monitoring alert disabled with https://github.com/okurz/openqa_monitoring/pull/10 so don't get confused when you don't get an email anymore.

#3 Updated by nicksinger over 2 years ago

  • Related to action #21836: [tools][sprint 201709.1] Many "A message received from unknown worker connection" log entries on openqa.suse.de added

#4 Updated by coolo over 2 years ago

  • Priority changed from Normal to High
  • Target version set to Ready

This is still going on

#5 Updated by szarate about 2 years ago

  • Priority changed from High to Normal

#6 Updated by mkittler about 1 year ago

Is this still happening? I couldn't see anything in the recent websocket server logs on OSD. I'm not sure why this problem would occur occasionally but it should be easy to rewrite the code to get rid of it (using the same pattern as in the developer mode code).

#7 Updated by okurz 8 months ago

  • Category changed from 132 to Feature requests

#8 Updated by okurz 7 months ago

  • Status changed from In Progress to Workable

I just checked the logs on o3 with sudo grep 'Worker not found for given connection' /var/log/openqa and found a lot of these messages still.

#9 Updated by mkittler 7 months ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler

Since I'm looking at the web socket server code anyways right now I'll have a look.

#11 Updated by okurz 7 months ago

PR merged. I am not sure if the PR should be the only thing needed to resolve this ticket though.

#12 Updated by mkittler 7 months ago

  • Status changed from In Progress to Feedback

Me neither. I could only reproduce this issue by configuring a worker to connect to the same web UI twice at the same time (which is unlikely to happen in production). So let's see whether this fixes the production case as well (which is - if my theory is correct - that the worker already tries to reconnect while the web socket server hasn't handled the previous disconnect yet).

#13 Updated by okurz 7 months ago

mkittler wrote:

I could only reproduce this issue by configuring a worker to connect to the same web UI twice at the same time (which is unlikely to happen in production)

Can an admin by mistake really configure the worker in this way or rather change the code?

#14 Updated by mkittler 7 months ago

If you add a worker host twice in the config the worker will connect twice as if they were different hosts. No de-duplication happens. (I guess this was also the case before my worker restructuring. At least there was no explicit code for de-duplication.)

I could of course change it so the hostname/URL would be de-duplicated at least on string-level.

#15 Updated by okurz 7 months ago

I would rather die the worker hard on this configuration error, don't try to be too smart in code when the admin messes up :)

#16 Updated by mkittler 6 months ago

  • Status changed from Feedback to Resolved

I've just had a brief look at the recent OSD logs and it is not happening anymore. I'd say making the worker fail in this case is a different issue.

#17 Updated by coolo 5 months ago

  • Target version changed from Ready to Done

Also available in: Atom PDF