Project

General

Profile

action #45191

developer mode: error message just when clicking "Cancel job"

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2018-11-22
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

So far the user gets a not so nice error message that the connection to os-autoinst has been lost when the test finishes.


Related issues

Related to openQA Project - action #39227: Handle the job being finished more nicely in developer modeResolved2018-08-06

Related to openQA Project - action #57707: isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2bResolved2019-10-04

History

#1 Updated by okurz almost 3 years ago

  • Copied from action #44249: developer mode: "Stop timeout" - like in the old interactive mode :) added

#2 Updated by mkittler almost 3 years ago

  • Related to action #39227: Handle the job being finished more nicely in developer mode added

#3 Updated by mkittler almost 3 years ago

  • Copied from deleted (action #44249: developer mode: "Stop timeout" - like in the old interactive mode :))

#4 Updated by mkittler almost 3 years ago

  • Description updated (diff)
  • Assignee set to mkittler
  • Target version set to Current Sprint

#5 Updated by mkittler almost 3 years ago

Even with my previous idea this turns out to be hard to be implemented. So far I'm unable to prevent the command server from being interrupted until it informs the web socket clients.

Maybe I can also just delay showing the error message in the front-end. That would be a not nice solution but not sure whether messing with os-autoinst's IPC code is worth it.

#6 Updated by mkittler almost 3 years ago

The code in isotovideo to terminate (and eventually kill) its subprocesses is only one way those subprocesses are terminated/killed. The worker-side also attempts to terminate/kill those subprocesses. Apparently we or better mudler re-implemented his own container here. That is why I had no success with extending the livetime of the command server to send some last command.

When working on that I come to the conclusion that we really need more documentation about the whole openQA architecture (not only specific packages and methods). It occupied me quite a while to reverse-engineer/understand what's happening here.

I find it also weird to have isotovideo itself taking care about its subprocess and terminating them also on the worker-side. Additionally, the use of Mojo::IOLoop::ReadWriteProcess::Session and Mojo::IOLoop::ReadWriteProcess::Container is kind of subtitle considering its impact/implications.


To solve this issue I could adjust the worker so it doesn't try to terminate/kill isotovideo. Instead it sends a command to the command server which will then informs web socket clients and stops. The worker could still attempt to terminate/kill isotovideo if stopping via the command server didn't work. That way I would not have to touch any of the session/container code and just let that be the fallback.

#7 Updated by mkittler almost 3 years ago

  • Status changed from New to Feedback

Before implementing the 2nd approach (and possibly scraping it again) I'd like to have some feedback.

#8 Updated by mkittler almost 3 years ago

WIP branches for 2nd approach:

Right now I'm stuck because the HTTP request from the worker to os-autoinst command server does not work yet. Maybe kraih can help when he's no longer sick.

#9 Updated by mkittler almost 3 years ago

  • Status changed from Feedback to In Progress

It should work now:

#10 Updated by mkittler almost 3 years ago

  • Status changed from In Progress to Resolved

PRs are merged so the error should be gone as soon as everything is deployed.

#11 Updated by okurz about 2 years ago

  • Related to action #57707: isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2b added

Also available in: Atom PDF