action #40793
closedPrevent isotovideo <=> command server IPC to occasionally lose messages
Description
Symptom analysis¶
33-developer-mode.t
is failing often on Travis with the following error message:
not ok 2 - web socket connection closed prematurely, was waiting for resume
# Failed test 'web socket connection closed prematurely, was waiting for resume'
# at /opt/testing_area/openqa/t/lib/OpenQA/Test/FullstackUtils.pm line 128.
On the developer console the following lines are indeed missing in the error case (compared to a good run and also what I would expect):
{"data":{"json_cmd_token":"GxmYxieq","resume_test_execution":null},"type":"info","what":"cmdsrvmsg"}
{"data":{"reload_needles":{"cmd":"backend_reload_needles","json_cmd_token":"mzZzjVLv"}},"type":"info","what":"cmdsrvmsg"}
{"data":{"check_screen":"on_prompt"},"type":"info","what":"cmdsrvmsg"}
In the autoinst-log.txt
- even in the error case - we get 3 times isotovideo: test execution will be resumed
and the test actually resumes. So the client definitely sends the message to resume correctly and the backend/isotovideo reacts accordingly. Only propagating this information back to the command server seems to fail occasionally. Hence there are also the following lines in the autoinst-log.txt
missing in the error case:
[2018-09-10T15:04:59.0788 CEST] [debug] BYTES {"resume_test_execution":null,"json_cmd_token":"IetuXGbf"}
[2018-09-10T15:04:59.0789 CEST] [debug] BYTES {"reload_needles":{"cmd":"backend_reload_needles","json_cmd_token":"qyQbPVBm"},"json_cmd_token":"BCZHrjDC"}
Those lines should be printed by the command server after propagating the commands to all clients.
Deduction¶
It seems that the IPC between isotovideo and the command server doesn't work reliably.
Tasks¶
- Improve log messages and error handling (eg.
BYTES ...
is not a meaningful context for the JSON document) - Find the conditions for "occasionally", try to reproduce locally (if possible)
- Fix the IPC
Updated by mkittler about 6 years ago
- Subject changed from Fix occational problems of isotovideo <=> command server IPC loosing messages to Prevent isotovideo <=> command server IPC to occasionally lose messages
- Status changed from New to In Progress
actual PR: https://github.com/os-autoinst/os-autoinst/pull/1022
for openQA's CI: https://github.com/os-autoinst/openQA/pull/1782
Updated by coolo about 6 years ago
- Status changed from In Progress to Resolved
- Target version set to Done