Project

General

Profile

action #21836

Updated by nicksinger over 7 years ago

tldr; we have too many "A message received from unknown worker connection" in our logs. These are caused by workers still trying to connect. 

 * Why does this happen exactly? 
 * We need to increase the information of the message 

 Right now we've many log entries like these. I've highlighted the one in question and added a little more context to it: this: 

     [Wed [Tue Aug    9 10:04:20 8 13:31:34 2017] [websockets:error] Worker not found for given [11919:info] AMQP connection during established 
     [Tue Aug    8 13:31:34 2017] [11918:info] AMQP connection close established 
     [Wed [Tue Aug    9 10:04:20 8 13:31:35 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:20 8 13:31:36 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:21 8 13:31:37 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:22 8 13:31:38 2017] [5391:info] [22836:info] Worker 30039 11896 stopped 
     [Wed [Tue Aug    9 10:04:22 8 13:31:38 2017] [18822:info] [11925:info] Worker 18822 11925 started 
     [Wed [Tue Aug    9 10:04:22 8 13:31:38 2017] [18822:info] [11925:info] Connecting to AMQP server 
     [Wed [Tue Aug    9 10:04:22 8 13:31:38 2017] [30051:info] [11925:info] AMQP connection established 
     [Tue Aug    8 13:31:38 2017] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:38 2017] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:40 2017] [9888:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 637 
     [Wed [Tue Aug    9 10:04:22 8 13:31:41 2017] [18822:info] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:41 2017] [22836:info] Worker 11607 stopped 
     [Tue Aug    8 13:31:41 2017] [11927:info] Worker 11927 started 
     [Tue Aug    8 13:31:41 2017] [11927:info] Connecting to AMQP server 
     [Tue Aug    8 13:31:41 2017] [11927:info] AMQP connection established 
     [Wed [Tue Aug    9 10:04:22 8 13:31:41 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:22 8 13:31:42 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:22 8 13:31:42 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:23 8 13:31:42 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:24 8 13:31:43 2017] [websockets:error] Worker not found for given connection during connection close 
     --> [Wed [Tue Aug    9 10:04:24 8 13:31:44 2017] [websockets:warn] A message received from unknown worker connection 
     [Wed Aug    9 10:04:26 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:26 8 13:31:46 2017] [11918:info] Got status update for job 1098505 that does not belong to Worker 637 
     [Tue Aug    8 13:31:46 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:27 8 13:31:47 2017] [30051:info] [11916:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 159 
     [Wed [Tue Aug    9 10:04:27 8 13:31:47 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:28 8 13:31:51 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:30 8 13:31:52 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:32 8 13:31:52 2017] [3650:info] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:52 2017] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:54 2017] [11879:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 159 
     [Wed [Tue Aug    9 10:04:33 8 13:31:54 2017] [17486:info] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:57 2017] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:57 2017] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:31:58 2017] [websockets:error] Worker not found for given connection during connection close 
     [Tue Aug    8 13:32:00 2017] [11919:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 159 
     [Wed [Tue Aug    9 10:04:33 8 13:32:00 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:33 8 13:32:00 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:36 8 13:32:00 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:37 8 13:32:01 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:38 8 13:32:01 2017] [22836:info] Worker 11927 stopped 
     [Tue Aug    8 13:32:01 2017] [11945:info] Worker 11945 started 
     [Tue Aug    8 13:32:01 2017] [11945:info] Connecting to AMQP server 
     [Tue Aug    8 13:32:01 2017] [11945:info] AMQP connection established 
     [Tue Aug    8 13:32:02 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed [Tue Aug    9 10:04:38 8 13:32:02 2017] [17486:info] Got status update [websockets:error] Worker not found for job 1099197 that does given connection during connection close 
     [Tue Aug    8 13:32:02 2017] [websockets:error] Worker not belong to found for given connection during connection close 
     [Tue Aug    8 13:32:04 2017] [websockets:error] Worker 384 not found for given connection during connection close 
     [Tue Aug    8 13:32:05 2017] [websockets:error] Worker not found for given connection during connection close 

 Excerpt about this issue: 

     13:10 <nsinger> mudler: are you two still working on the scheduler somehow? We get a lot of "A message received from unknown worker connection" in our osd logs and i'm not 100% certain if we have to act on it 
     13:13 <mudler> nsinger, that doesn't regard too much the scheduler changes 
     13:13 <mudler> nsinger, was the connection dropped from the worker? 
     13:13 <mudler> e.g. similar to https://progress.opensuse.org/issues/21028#change-59178 
     13:16 <nsinger> mudler: have not checked further yet. In the logwarn mail i only get the line posted above but maybe it ignores the context 
     13:16 <mudler> a bit of context would help to debug the issue further 
     13:17 <nsinger> mudler: just checked; http://paste.suse.de/24302 
     13:17 <nsinger> i think the "Worker 4650 stopped" could explain it :) 
     13:19 <mudler> oh i see then :)  
     13:19 <mudler> oki - good to know 
     13:20 <nsinger> mudler: i'd create a ticket if it is not already covered?  
     13:22 <mudler> well - that is partially covered, worker should re-register and re-create a new ws association inside websocket server 
     13:22 <nsinger> mudler: can you give me a link please? Then i could add this line inside our ignore-list 
     13:22 <mudler> i wonder why worker kept sending message 
     13:24 <mudler> https://github.com/os-autoinst/openQA/blob/70954b894d39e3215d6db38b743f4cc1680c2a2b/lib/OpenQA/WebSockets/Server.pm#L192 
     13:24 <nsinger> mudler: honestly, i've no clue if these lines in the log are even related. The "A message received"-message is pretty useless. Maybe we should increase information first here? 
     13:24 <mudler> nsinger, but i don't mind if you open an issue and assign to me - so i remember to re-check later 
     13:24 <mudler> true, it's pretty    though to debug like so 
     13:25 <mudler> but hold on a bit touching those files because i'm woring on a PR that touches them as well 
     13:25 <mudler> working* 
     13:25 <mudler> (i would avoid merge mess! :p ) 
     13:26 <nsinger> mudler: if you take care of improving the message, i'm happy not touching these files ;) 
     13:26 <mudler> nsinger, sure :) will add to the changes then 
     13:28 <mudler> nsinger, it's cool if i will let it dump the message? 
     13:29 <nsinger> mudler: what do you mean? "dump" as in: remove it completely? 
     13:30 <mudler> sorry - meant dump the full output in the logs, it could be a bit cluttering but if we want to see what's happening it would be the best 
     13:31 <nsinger> mudler: fine with me but please lower the loglevel to at least debug then :P 
     13:32 <mudler> nsinger, then i'll print the full message in log debug and keep the same error message as for now in warn

Back