Project

General

Profile

action #23536

Updated by nicksinger over 6 years ago

Since (rough estimation) the heavy modification of the scheduler we can regularly observe the following error appear in the openQA log files: 

     [Wed Aug 23 09:56:24 2017] [11197:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 

 Context from the log file: 

     [Wed Aug 23 09:56:21 2017] [websockets:error] Worker not found for given connection during connection close 
     [Wed Aug 23 09:56:22 2017] [3069:info] Stopping worker 16795 gracefully (800 seconds) 
     [Wed Aug 23 09:56:22 2017] [23576:info] Worker 23576 started 
     [Wed Aug 23 09:56:22 2017] [23576:info] Connecting to AMQP server 
     [Wed Aug 23 09:56:22 2017] [3069:info] Worker 16795 stopped 
     [Wed Aug 23 09:56:22 2017] [23576:info] AMQP connection established 
     [Wed Aug 23 09:56:24 2017] [3069:info] Stopping worker 16889 gracefully (800 seconds) 
     [Wed Aug 23 09:56:24 2017] [23578:info] Worker 23578 started 
     [Wed Aug 23 09:56:24 2017] [23578:info] Connecting to AMQP server 
     [Wed Aug 23 09:56:24 2017] [23578:info] AMQP connection established 
     [Wed Aug 23 09:56:24 2017] [11197:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 
     [Wed Aug 23 09:56:24 2017] [18942:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 
     [Wed Aug 23 09:56:24 2017] [13669:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 
     [Wed Aug 23 09:56:24 2017] [22897:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 
     [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 4fb/518/987743821945823012420a62bd.png 
     [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 83c/f60/e85ad7da4f25b1eb96f0680aa9.png 
     [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 581/fd4/bdf3965a15065f31523cef9463.png 
     [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 7ad/564/ae42ed15f1806c65f71cbfd4f9.png 
     [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 898/5a6/6807e3c39b97dd10458dc2b70d.png 
     [Wed Aug 23 09:56:25 2017] [3069:info] Stopping worker 5110 gracefully (800 seconds) 
     [Wed Aug 23 09:56:25 2017] [3069:info] Worker 5110 stopped 
     [Wed Aug 23 09:56:25 2017] [23579:info] Worker 23579 started 
     [Wed Aug 23 09:56:25 2017] [23579:info] Connecting to AMQP server 

 Everything related to one of the workers who raised this message: 

     [Wed Aug 23 08:57:57 2017] [11197:info] Worker 11197 started 
     [Wed Aug 23 08:57:57 2017] [11197:info] Connecting to AMQP server 
     [Wed Aug 23 08:57:57 2017] [11197:info] AMQP connection established 
     [Wed Aug 23 09:13:00 2017] [7513:debug] removing screenshot a4f/fc9/e72271244111977be85ad8dcc1.png 
     [Wed Aug 23 09:53:30 2017] [11197:info] Got status update for job 1125270 that does not belong to Worker 543 
     [Wed Aug 23 09:56:24 2017] [11197:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. 
     [Wed Aug 23 09:56:39 2017] [3069:info] Stopping worker 11197 gracefully (800 seconds) 
     [Wed Aug 23 09:56:39 2017] [3069:info] Worker 11197 stopped 



 Unfortunately I cannot see what exactly is causing the issue here. 

 ### Suggestions on how to improve this message: 
 * If possible, include more specific reasons for this (what is the context of the message? What did the worker try tried to do?) 

 If this message is critical (not self recovering): 

 * Add hints where an admin could look for more information 
 * Expand message to explain the admin: "Hey, something just broke - you need to interact" 


 If this message should just inform the admin: 

 * Decrease log level to at max "warn" 

Back