action #45191

developer mode: error message just when clicking "Cancel job"

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:ResolvedStart date:22/11/2018
Priority:NormalDue date:
Assignee:mkittler% Done:

0%

Category:Concrete Bugs
Target version:Current Sprint
Difficulty:
Duration:

Description

So far the user gets a not so nice error message that the connection to os-autoinst has been lost when the test finishes.


Related issues

Related to openQA Project - action #39227: Handle the job being finished more nicely in developer mode Resolved 06/08/2018
Related to openQA Project - action #57707: isotovideo fails to terminate cleanly, message "isotovide... Resolved 04/10/2019

History

#1 Updated by okurz about 1 year ago

  • Copied from action #44249: developer mode: "Stop timeout" - like in the old interactive mode :) added

#2 Updated by mkittler about 1 year ago

  • Related to action #39227: Handle the job being finished more nicely in developer mode added

#3 Updated by mkittler about 1 year ago

  • Copied from deleted (action #44249: developer mode: "Stop timeout" - like in the old interactive mode :))

#4 Updated by mkittler about 1 year ago

  • Description updated (diff)
  • Assignee set to mkittler
  • Target version set to Current Sprint

#5 Updated by mkittler about 1 year ago

Even with my previous idea this turns out to be hard to be implemented. So far I'm unable to prevent the command server from being interrupted until it informs the web socket clients.

Maybe I can also just delay showing the error message in the front-end. That would be a not nice solution but not sure whether messing with os-autoinst's IPC code is worth it.

#6 Updated by mkittler about 1 year ago

The code in isotovideo to terminate (and eventually kill) its subprocesses is only one way those subprocesses are terminated/killed. The worker-side also attempts to terminate/kill those subprocesses. Apparently we or better mudler re-implemented his own container here. That is why I had no success with extending the livetime of the command server to send some last command.

When working on that I come to the conclusion that we really need more documentation about the whole openQA architecture (not only specific packages and methods). It occupied me quite a while to reverse-engineer/understand what's happening here.

I find it also weird to have isotovideo itself taking care about its subprocess and terminating them also on the worker-side. Additionally, the use of Mojo::IOLoop::ReadWriteProcess::Session and Mojo::IOLoop::ReadWriteProcess::Container is kind of subtitle considering its impact/implications.


To solve this issue I could adjust the worker so it doesn't try to terminate/kill isotovideo. Instead it sends a command to the command server which will then informs web socket clients and stops. The worker could still attempt to terminate/kill isotovideo if stopping via the command server didn't work. That way I would not have to touch any of the session/container code and just let that be the fallback.

#7 Updated by mkittler about 1 year ago

  • Status changed from New to Feedback

Before implementing the 2nd approach (and possibly scraping it again) I'd like to have some feedback.

#8 Updated by mkittler about 1 year ago

WIP branches for 2nd approach:

Right now I'm stuck because the HTTP request from the worker to os-autoinst command server does not work yet. Maybe @kraih can help when he's no longer sick.

#9 Updated by mkittler about 1 year ago

  • Status changed from Feedback to In Progress

It should work now:

#10 Updated by mkittler about 1 year ago

  • Status changed from In Progress to Resolved

PRs are merged so the error should be gone as soon as everything is deployed.

#11 Updated by okurz 5 months ago

  • Related to action #57707: isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2b added

Also available in: Atom PDF