action #45191
closeddeveloper mode: error message just when clicking "Cancel job"
0%
Description
So far the user gets a not so nice error message that the connection to os-autoinst has been lost when the test finishes.
Updated by okurz over 5 years ago
- Copied from action #44249: developer mode: "Stop timeout" - like in the old interactive mode :) added
Updated by mkittler over 5 years ago
- Related to action #39227: Handle the job being finished more nicely in developer mode added
Updated by mkittler over 5 years ago
- Copied from deleted (action #44249: developer mode: "Stop timeout" - like in the old interactive mode :))
Updated by mkittler over 5 years ago
- Description updated (diff)
- Assignee set to mkittler
- Target version set to Current Sprint
Updated by mkittler over 5 years ago
Even with my previous idea this turns out to be hard to be implemented. So far I'm unable to prevent the command server from being interrupted until it informs the web socket clients.
Maybe I can also just delay showing the error message in the front-end. That would be a not nice solution but not sure whether messing with os-autoinst's IPC code is worth it.
Updated by mkittler over 5 years ago
The code in isotovideo to terminate (and eventually kill) its subprocesses is only one way those subprocesses are terminated/killed. The worker-side also attempts to terminate/kill those subprocesses. Apparently we or better mudler re-implemented his own container here. That is why I had no success with extending the livetime of the command server to send some last command.
When working on that I come to the conclusion that we really need more documentation about the whole openQA architecture (not only specific packages and methods). It occupied me quite a while to reverse-engineer/understand what's happening here.
I find it also weird to have isotovideo itself taking care about its subprocess and terminating them also on the worker-side. Additionally, the use of Mojo::IOLoop::ReadWriteProcess::Session
and Mojo::IOLoop::ReadWriteProcess::Container
is kind of subtitle considering its impact/implications.
To solve this issue I could adjust the worker so it doesn't try to terminate/kill isotovideo. Instead it sends a command to the command server which will then informs web socket clients and stops. The worker could still attempt to terminate/kill isotovideo if stopping via the command server didn't work. That way I would not have to touch any of the session/container code and just let that be the fallback.
Updated by mkittler over 5 years ago
- Status changed from New to Feedback
Before implementing the 2nd approach (and possibly scraping it again) I'd like to have some feedback.
Updated by mkittler over 5 years ago
WIP branches for 2nd approach:
- https://github.com/os-autoinst/os-autoinst/compare/master...Martchus:devel_mode_prevent_error2
- https://github.com/os-autoinst/openQA/compare/master...Martchus:devel_mode_prevent_error2
Right now I'm stuck because the HTTP request from the worker to os-autoinst command server does not work yet. Maybe @kraih can help when he's no longer sick.
Updated by mkittler over 5 years ago
- Status changed from Feedback to In Progress
It should work now:
- PR for isotovideo and command server part: https://github.com/os-autoinst/os-autoinst/pull/1093
- PR for worker and UI part: https://github.com/os-autoinst/openQA/pull/1981
Updated by mkittler over 5 years ago
- Status changed from In Progress to Resolved
PRs are merged so the error should be gone as soon as everything is deployed.
Updated by okurz almost 5 years ago
- Related to action #57707: isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2b added