action #23536
Updated by nicksinger over 6 years ago
Since (rough estimation) the heavy modification of the scheduler we can regularly observe the following error appear in the openQA log files: [Wed Aug 23 09:56:24 2017] [11197:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Context from the log file: [Wed Aug 23 09:56:21 2017] [websockets:error] Worker not found for given connection during connection close [Wed Aug 23 09:56:22 2017] [3069:info] Stopping worker 16795 gracefully (800 seconds) [Wed Aug 23 09:56:22 2017] [23576:info] Worker 23576 started [Wed Aug 23 09:56:22 2017] [23576:info] Connecting to AMQP server [Wed Aug 23 09:56:22 2017] [3069:info] Worker 16795 stopped [Wed Aug 23 09:56:22 2017] [23576:info] AMQP connection established [Wed Aug 23 09:56:24 2017] [3069:info] Stopping worker 16889 gracefully (800 seconds) [Wed Aug 23 09:56:24 2017] [23578:info] Worker 23578 started [Wed Aug 23 09:56:24 2017] [23578:info] Connecting to AMQP server [Wed Aug 23 09:56:24 2017] [23578:info] AMQP connection established [Wed Aug 23 09:56:24 2017] [11197:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. [Wed Aug 23 09:56:24 2017] [18942:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. [Wed Aug 23 09:56:24 2017] [13669:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. [Wed Aug 23 09:56:24 2017] [22897:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 4fb/518/987743821945823012420a62bd.png [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 83c/f60/e85ad7da4f25b1eb96f0680aa9.png [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 581/fd4/bdf3965a15065f31523cef9463.png [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 7ad/564/ae42ed15f1806c65f71cbfd4f9.png [Wed Aug 23 09:56:25 2017] [7513:debug] removing screenshot 898/5a6/6807e3c39b97dd10458dc2b70d.png [Wed Aug 23 09:56:25 2017] [3069:info] Stopping worker 5110 gracefully (800 seconds) [Wed Aug 23 09:56:25 2017] [3069:info] Worker 5110 stopped [Wed Aug 23 09:56:25 2017] [23579:info] Worker 23579 started [Wed Aug 23 09:56:25 2017] [23579:info] Connecting to AMQP server Everything related to one of the workers who raised this message: [Wed Aug 23 08:57:57 2017] [11197:info] Worker 11197 started [Wed Aug 23 08:57:57 2017] [11197:info] Connecting to AMQP server [Wed Aug 23 08:57:57 2017] [11197:info] AMQP connection established [Wed Aug 23 09:13:00 2017] [7513:debug] removing screenshot a4f/fc9/e72271244111977be85ad8dcc1.png [Wed Aug 23 09:53:30 2017] [11197:info] Got status update for job 1125270 that does not belong to Worker 543 [Wed Aug 23 09:56:24 2017] [11197:error] org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. [Wed Aug 23 09:56:39 2017] [3069:info] Stopping worker 11197 gracefully (800 seconds) [Wed Aug 23 09:56:39 2017] [3069:info] Worker 11197 stopped Unfortunately I cannot see what exactly is causing the issue here. ### Suggestions on how to improve this message: * If possible, include more specific reasons for this (what is the context of the message? What did the worker try tried to do?) If this message is critical (not self recovering): * Add hints where an admin could look for more information * Expand message to explain the admin: "Hey, something just broke - you need to interact" If this message should just inform the admin: * Decrease log level to at max "warn"