Project

General

Profile

Actions

action #40793

closed

Prevent isotovideo <=> command server IPC to occasionally lose messages

Added by mkittler over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2018-09-10
Due date:
% Done:

0%

Estimated time:

Description

Symptom analysis

33-developer-mode.t is failing often on Travis with the following error message:

    not ok 2 - web socket connection closed prematurely, was waiting for resume
    #   Failed test 'web socket connection closed prematurely, was waiting for resume'
    #   at /opt/testing_area/openqa/t/lib/OpenQA/Test/FullstackUtils.pm line 128.

On the developer console the following lines are indeed missing in the error case (compared to a good run and also what I would expect):

{"data":{"json_cmd_token":"GxmYxieq","resume_test_execution":null},"type":"info","what":"cmdsrvmsg"}
{"data":{"reload_needles":{"cmd":"backend_reload_needles","json_cmd_token":"mzZzjVLv"}},"type":"info","what":"cmdsrvmsg"}
{"data":{"check_screen":"on_prompt"},"type":"info","what":"cmdsrvmsg"}

In the autoinst-log.txt - even in the error case - we get 3 times isotovideo: test execution will be resumed and the test actually resumes. So the client definitely sends the message to resume correctly and the backend/isotovideo reacts accordingly. Only propagating this information back to the command server seems to fail occasionally. Hence there are also the following lines in the autoinst-log.txt missing in the error case:

[2018-09-10T15:04:59.0788 CEST] [debug] BYTES {"resume_test_execution":null,"json_cmd_token":"IetuXGbf"}
[2018-09-10T15:04:59.0789 CEST] [debug] BYTES {"reload_needles":{"cmd":"backend_reload_needles","json_cmd_token":"qyQbPVBm"},"json_cmd_token":"BCZHrjDC"}

Those lines should be printed by the command server after propagating the commands to all clients.

Deduction

It seems that the IPC between isotovideo and the command server doesn't work reliably.

Tasks

  1. Improve log messages and error handling (eg. BYTES ... is not a meaningful context for the JSON document)
  2. Find the conditions for "occasionally", try to reproduce locally (if possible)
  3. Fix the IPC
Actions

Also available in: Atom PDF