action #19424
closed
[tools] logwarn: [websockets:error] Worker not found for given connection during connection close
Added by mgriessmeier over 7 years ago.
Updated about 5 years ago.
Category:
Feature requests
Description
Following message appears quite often lately in the logs:
[Tue May 30 06:10:09 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:10:41 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:15:14 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:15:46 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:20:19 2017] [websockets:error] Worker not found for given connection during connection close
[Tue May 30 06:20:51 2017] [websockets:error] Worker not found for given connection during connection close
[...]
- Status changed from New to In Progress
This was persisting until yesterday evening. The last line in the logfile I can see for now
[Fri Jun 2 23:32:27 2017] [websockets:error] Worker not found for given connection during connection close
I looked in the source code itself but could not find a good way to improve the error message to hint to a certain problem or worker.
- Related to action #21836: [tools][sprint 201709.1] Many "A message received from unknown worker connection" log entries on openqa.suse.de added
- Priority changed from Normal to High
- Target version set to Ready
- Priority changed from High to Normal
Is this still happening? I couldn't see anything in the recent websocket server logs on OSD. I'm not sure why this problem would occur occasionally but it should be easy to rewrite the code to get rid of it (using the same pattern as in the developer mode code).
- Category changed from 132 to Feature requests
- Status changed from In Progress to Workable
I just checked the logs on o3 with sudo grep 'Worker not found for given connection' /var/log/openqa
and found a lot of these messages still.
- Status changed from Workable to In Progress
- Assignee set to mkittler
Since I'm looking at the web socket server code anyways right now I'll have a look.
PR merged. I am not sure if the PR should be the only thing needed to resolve this ticket though.
- Status changed from In Progress to Feedback
Me neither. I could only reproduce this issue by configuring a worker to connect to the same web UI twice at the same time (which is unlikely to happen in production). So let's see whether this fixes the production case as well (which is - if my theory is correct - that the worker already tries to reconnect while the web socket server hasn't handled the previous disconnect yet).
mkittler wrote:
I could only reproduce this issue by configuring a worker to connect to the same web UI twice at the same time (which is unlikely to happen in production)
Can an admin by mistake really configure the worker in this way or rather change the code?
If you add a worker host twice in the config the worker will connect twice as if they were different hosts. No de-duplication happens. (I guess this was also the case before my worker restructuring. At least there was no explicit code for de-duplication.)
I could of course change it so the hostname/URL would be de-duplicated at least on string-level.
I would rather die the worker hard on this configuration error, don't try to be too smart in code when the admin messes up :)
- Status changed from Feedback to Resolved
I've just had a brief look at the recent OSD logs and it is not happening anymore. I'd say making the worker fail in this case is a different issue.
- Target version changed from Ready to Done
Also available in: Atom
PDF