action #21836
Updated by nicksinger over 7 years ago
tldr; we have too many "A message received from unknown worker connection" in our logs. These are caused by workers still trying to connect. * Why does this happen exactly? * We need to increase the information of the message Right now we've many log entries like these. I've highlighted the one in question and added a little more context to it: this: [Wed [Tue Aug 9 10:04:20 8 13:31:34 2017] [websockets:error] Worker not found for given [11919:info] AMQP connection during established [Tue Aug 8 13:31:34 2017] [11918:info] AMQP connection close established [Wed [Tue Aug 9 10:04:20 8 13:31:35 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:20 8 13:31:36 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:21 8 13:31:37 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:22 8 13:31:38 2017] [5391:info] [22836:info] Worker 30039 11896 stopped [Wed [Tue Aug 9 10:04:22 8 13:31:38 2017] [18822:info] [11925:info] Worker 18822 11925 started [Wed [Tue Aug 9 10:04:22 8 13:31:38 2017] [18822:info] [11925:info] Connecting to AMQP server [Wed [Tue Aug 9 10:04:22 8 13:31:38 2017] [30051:info] [11925:info] AMQP connection established [Tue Aug 8 13:31:38 2017] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:38 2017] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:40 2017] [9888:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 637 [Wed [Tue Aug 9 10:04:22 8 13:31:41 2017] [18822:info] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:41 2017] [22836:info] Worker 11607 stopped [Tue Aug 8 13:31:41 2017] [11927:info] Worker 11927 started [Tue Aug 8 13:31:41 2017] [11927:info] Connecting to AMQP server [Tue Aug 8 13:31:41 2017] [11927:info] AMQP connection established [Wed [Tue Aug 9 10:04:22 8 13:31:41 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:22 8 13:31:42 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:22 8 13:31:42 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:23 8 13:31:42 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:24 8 13:31:43 2017] [websockets:error] Worker not found for given connection during connection close --> [Wed [Tue Aug 9 10:04:24 8 13:31:44 2017] [websockets:warn] A message received from unknown worker connection [Wed Aug 9 10:04:26 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:26 8 13:31:46 2017] [11918:info] Got status update for job 1098505 that does not belong to Worker 637 [Tue Aug 8 13:31:46 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:27 8 13:31:47 2017] [30051:info] [11916:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 159 [Wed [Tue Aug 9 10:04:27 8 13:31:47 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:28 8 13:31:51 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:30 8 13:31:52 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:32 8 13:31:52 2017] [3650:info] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:52 2017] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:54 2017] [11879:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 159 [Wed [Tue Aug 9 10:04:33 8 13:31:54 2017] [17486:info] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:57 2017] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:57 2017] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:31:58 2017] [websockets:error] Worker not found for given connection during connection close [Tue Aug 8 13:32:00 2017] [11919:info] Got status update for job 1099197 1098505 that does not belong to Worker 384 159 [Wed [Tue Aug 9 10:04:33 8 13:32:00 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:33 8 13:32:00 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:36 8 13:32:00 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:37 8 13:32:01 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:38 8 13:32:01 2017] [22836:info] Worker 11927 stopped [Tue Aug 8 13:32:01 2017] [11945:info] Worker 11945 started [Tue Aug 8 13:32:01 2017] [11945:info] Connecting to AMQP server [Tue Aug 8 13:32:01 2017] [11945:info] AMQP connection established [Tue Aug 8 13:32:02 2017] [websockets:error] Worker not found for given connection during connection close [Wed [Tue Aug 9 10:04:38 8 13:32:02 2017] [17486:info] Got status update [websockets:error] Worker not found for job 1099197 that does given connection during connection close [Tue Aug 8 13:32:02 2017] [websockets:error] Worker not belong to found for given connection during connection close [Tue Aug 8 13:32:04 2017] [websockets:error] Worker 384 not found for given connection during connection close [Tue Aug 8 13:32:05 2017] [websockets:error] Worker not found for given connection during connection close Excerpt about this issue: 13:10 <nsinger> mudler: are you two still working on the scheduler somehow? We get a lot of "A message received from unknown worker connection" in our osd logs and i'm not 100% certain if we have to act on it 13:13 <mudler> nsinger, that doesn't regard too much the scheduler changes 13:13 <mudler> nsinger, was the connection dropped from the worker? 13:13 <mudler> e.g. similar to https://progress.opensuse.org/issues/21028#change-59178 13:16 <nsinger> mudler: have not checked further yet. In the logwarn mail i only get the line posted above but maybe it ignores the context 13:16 <mudler> a bit of context would help to debug the issue further 13:17 <nsinger> mudler: just checked; http://paste.suse.de/24302 13:17 <nsinger> i think the "Worker 4650 stopped" could explain it :) 13:19 <mudler> oh i see then :) 13:19 <mudler> oki - good to know 13:20 <nsinger> mudler: i'd create a ticket if it is not already covered? 13:22 <mudler> well - that is partially covered, worker should re-register and re-create a new ws association inside websocket server 13:22 <nsinger> mudler: can you give me a link please? Then i could add this line inside our ignore-list 13:22 <mudler> i wonder why worker kept sending message 13:24 <mudler> https://github.com/os-autoinst/openQA/blob/70954b894d39e3215d6db38b743f4cc1680c2a2b/lib/OpenQA/WebSockets/Server.pm#L192 13:24 <nsinger> mudler: honestly, i've no clue if these lines in the log are even related. The "A message received"-message is pretty useless. Maybe we should increase information first here? 13:24 <mudler> nsinger, but i don't mind if you open an issue and assign to me - so i remember to re-check later 13:24 <mudler> true, it's pretty though to debug like so 13:25 <mudler> but hold on a bit touching those files because i'm woring on a PR that touches them as well 13:25 <mudler> working* 13:25 <mudler> (i would avoid merge mess! :p ) 13:26 <nsinger> mudler: if you take care of improving the message, i'm happy not touching these files ;) 13:26 <mudler> nsinger, sure :) will add to the changes then 13:28 <mudler> nsinger, it's cool if i will let it dump the message? 13:29 <nsinger> mudler: what do you mean? "dump" as in: remove it completely? 13:30 <mudler> sorry - meant dump the full output in the logs, it could be a bit cluttering but if we want to see what's happening it would be the best 13:31 <nsinger> mudler: fine with me but please lower the loglevel to at least debug then :P 13:32 <mudler> nsinger, then i'll print the full message in log debug and keep the same error message as for now in warn