action #36442
closedcoordination #14818: [EPIC] Interactive mode is an usability disaster
Access to running SUTs for System Developers
Description
As a System Developer I want openQA to pause a job before a specific test module is executed
so that I can go step by step through the test module and manually collect logs when the I can reproduce the error.
- AC1: System Developer is able to specify which module the job has to pause at.
- AC1.1: A variable can be used to define a pause / developer entry point
- AC1.2: Scheduled and running jobs offer the possibility to define a pause / developer entry point
- AC1.3: Jobs in assigned state do not allow the possibility to define a pause / developer entry
- AC2: System Developer is able to interact with the SUT.
- AC2.1: System Developer is able to connect to the SUT via vnc.
- AC2.2: A message is shown in the webUI for a SUT that is waiting for a developer to connect with the instructions
- AC3: System Developer is able to share information between the SUT and another machine.
Updated by szarate over 6 years ago
During the sprint planning we agreed to discuss what are the implementation details for this feature.
There is a PoC at https://github.com/os-autoinst/os-autoinst/pull/958 for a proxy to os-autoinst command server, so that Lea can efficiently send openQA instructions to isotovideo via openQA testapi. But lea wants to also be able to interact with the SUT directly (In fact, this is where she's interested at).
PR https://github.com/os-autoinst/openQA/pull/1632 is related to the removal of the current interactive mode.
Updated by szarate over 6 years ago
PO is suggesting to have a WS proxy that allows comunication user <==> webUI < == > command server < == > isotovideo
so first step: have a WS connection from webui to isotovideo commands that fetch the currently running testapi command
for that we need isotovideo changes + WS proxy within webui + webui changes
Updated by mkittler over 6 years ago
Today I played with the web socket console (https://github.com/os-autoinst/openQA/pull/1661) to set a test variable when the test is running and then pause autotest if the value of the variable matches the test name.
I noticed that this task isn't that easy. It is quite hard to understand how it works due to the multi-process architecture. My understanding of the different processes so far (please correct me if I'm wrong):
process: relevant Perl file(s): what it does:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
* isotovideo isotovideo spawns further processes, IO-loop for passing some commands (main occupation), cleanup
* backend baseclass.pm and derived, spawns and handles backend (eg. qemu), can receive commands from isotovideo IO-loop
console.pm and derived
* qemu
* videoencoder
* autotest autotest.pm, testapi.pm, determines test order, runs test code and thus calls testapi functions, sends commands
console_proxy.pm, to isotovideo IO-loop (via query_isotovideo)
basetest.pm and derived
* command server commands.pm provides GET/POST HTTP routes and ws server, passes commands received via web sockets
to isotovideo IO-loop
So the variable value I'd like to change must be passed from the command server to autotest. I still couldn't figure out how to do it. The autotest process seems only to be able to send commands, but can not receive. Since it runs test code most of the time, it is not possible to handle any async events here. So I tried to make it simply reload the variables before starting a test but haven't had any success either (maybe my code is just wrong).
Updated by coolo over 6 years ago
No, you can't rely on autotest. autotest is blocking - it's basically driving the execution, everything else is just reacting to it.
It's isotovideo's main loop that is relevant here - it will get a callback from autotest when the current test changes (cmd=>set_current_test). At this point it needs to stop taking commands from autotest and pretend the execution takes very long.
E.g. autotest might call cmd=>backend_type_string or cmd=>check_check - and those will just take a long time as isotovideo is no longer passing it to the backend.
Take your time to understand the architecture properly - and keeping your understanding documented is a good idea.
Updated by mkittler over 6 years ago
- Status changed from New to In Progress
- Assignee set to mkittler
PRs:
- MERGED: pausing/resuming jobs in os-autoinst: https://github.com/os-autoinst/os-autoinst/pull/966
- MERGED: development web socket console and extended fullstack test: https://github.com/os-autoinst/openQA/pull/1667
- WIP: add UI controls, add more tests: https://github.com/os-autoinst/openQA/pull/1691
Updated by mkittler over 6 years ago
- It is now deployed on e212: http://e212.suse.de/tests/12240#live
(Not the most recent commit, currently rebuilding the package.) - There are now 4 different tests related to the developer mode feature:
full-stack.t
: The regular full stack test connects directly to os-autoinst websocket server to pause and resume a test while it is running. It tests everything via the developer console.33-developer_mode.t
: The developer full stack test connects via the proxy pauses and resumes a test while it is running. It also tests whether the session is locked for other users. It tests mostly via the developer console.34-developer_mode-unit.t
: Unit tests for developer mode related features. Tests database operations, the ws proxy and other methods of the LiveViewHandler.pm controller. The connection to os-autoinst is always mocked.ui/25-developer_mode.t
: Tests only the UI-layer (mostly JavaScript code). Everything else (eg. any web socket connections) is mocked.
Still TODO:
- Cancel the job when the developer session is canceled DONE
- Additionally, the developer session shouldn't be unregistered so the responsible developer can always be tracked. DONE
- Loosing the web socket connection to os-autoinst shouldn't cancel the developer session anymore. Let's see how to handle the reconnect to os-autoinst then. DONE
- Likely fixing bugs and tests.
- Check the test coverage, add more tests if needed.
- More tests on staging instance.
Updated by mkittler over 6 years ago
For the actual access we could use noVNC: https://github.com/novnc/noVNC#server-requirements
It uses VNC over web sockets which is supported by QEMU. We could proxy that connection like the ws connection for the status. Not sure whether implementing this in Mojolicious will be efficient enough, though. It would be convenient because this way we could do the authentication as usual. Otherwise an extra access token could be generated and shared though the livehandler daemon of course.
Updated by szarate over 6 years ago
Looks fancy enough (Although I'm not fan of node based apps :P), but from what I could read, it's matter of just embedding app and the passing the websocket path mojolicious would only take care of displaying/serving assets... Or do you see another problem?
Updated by mkittler over 6 years ago
Pausing is implemented and there are also some instructions for accessing via VNC. Access to all kinds of tests is not implemented, though.
I'm unassigning because I suppose working on the other developer mode use cases precedes making this one more fancy.
Updated by szarate over 6 years ago
- Target version changed from Current Sprint to Ready
Updated by okurz about 6 years ago
- Related to action #42677: Keep virtual machines running on demand/failure (was: Don't just power off virtual machines upon job timeout) added
Updated by mkittler almost 6 years ago
- Related to action #45671: Improve "developer mode" connection hint for non-qemu remote machines added
Updated by okurz over 5 years ago
- Category changed from 132 to Feature requests
Updated by okurz over 5 years ago
- Status changed from In Progress to Resolved
- Assignee set to mkittler
back to "Workable". I see AC1 and AC2 fulfilled as well as implicitly "AC3: System Developer is able to share information between the SUT and another machine." considering that we have the basic possibility for VNC and other network access to the machines, e.g. one could even share data over ssh and such. I even suggest we are done here. Please reopen with a comment if thinking differently. Assigning to mkittler who did the main recent changes in this topic.