action #4680: livestream support for remote workers - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #4680

closed

action #3136: support remote workers

livestream support for remote workers

Added by oholecek over 10 years ago. Updated over 10 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

oholecek

Category:

Feature requests

Target version:

Sprint 12

Start date:

2014-11-13

Due date:

% Done:

Estimated time:

Description

Test livestream does not work on current remote worker draft.
I don't understand the code well enough but I guess it tries to read directly from pool directory. This directory is however not shared across hosts and present only on worker machine. I'm quite hesitant to share pools due to majority of I/O happening here.

What about proxying r/o VNC streams? Does not solve live log though.

Actions

Copy link

Updated by coolo over 10 years ago

proxying is exactly what OBS does. The OBS workers start a daemon and tell its port to the system. And when OBS gets a request for a log file of a currently building job, it will proxy that to the worker building it atm.

I would actually prefer that model to the current polling of /commands

Actions

Copy link

Updated by oholecek over 10 years ago

Status changed from New to In Progress

I spent same time looking into this and Mojolicious and now I need some opinions.
First thing is if we want workers to remain pull based = they ask for commands rather than listening for commands from Scheduler. I currently work with this assumption so I have introduced four more commands (livelog_start/stop, vnc_start/stop).
I moved log reading functionality to worker script and now I need this data to get to Scheduler so it can be proxied to client. Any suggestions?

I'm not yet very familiar with Mojolicious so in my current demo (concerning only livelog feature) I just create Mojo::IOLoop::Server and listen on port $baseport+$worker_id. And print to client everything I get from there. I kind of feel this is not very much Mojo-way, but seem straightforward. Also I did not yet measure CPU usage and check performance.

Actions

Copy link

Updated by coolo over 10 years ago

Assignee set to oholecek
Target version set to Sprint 12

Actions

Copy link

Updated by coolo over 10 years ago

this whole architecture feels very weired to me. Not your fault, the whole polling in the workers is awkward. If you're in the office in 9:30, drop into https://plus.google.com/hangouts/_/event/cb8hf9s89esamj86qhvq07v9ons?authuser=0

Actions

Copy link

Updated by aplanas over 10 years ago

As I understand in the meeting, the remote workers will write the results in the pool directory in a NFS mount point. The good thing about this is that the server (mojo) will not change at all, it will still read from a directory when a user connect to the live stream of a test. And this is very good.

The thing that I am not sure is the NFS limit. IMHO this make the deployment of openQA more complext that it is now. So why not use a REST service instead of NFS? For this the worker and / or os-autoinst will PUT / POST into a clear URL based on the path of the pool directory. For example, the worker #ID 1 now is writing in /var/openqa/pool/1/.. (or something like this). We can change that in a way that now the worker send PUT commands to http://SERVERIP/workers/pool/1/..

The good things that I see:

Easy deployment.
Workers do not need to live in the same network, and there is not need to open firewalls to NFS/RPC services
Cheap in resources. The PUT service is a simple process / thread in mojo (or apache)
Secure: there is a single writer in the directory
Easy to implement. Mojo will still read from a single directory, so the basic delivery arch do not change
Simple model, and more easy to understand

Actions

Copy link

Updated by coolo over 10 years ago

the decision for network file system is already done

Actions

Copy link

Updated by coolo over 10 years ago

see #3136

Actions

Copy link

Updated by oholecek over 10 years ago

aplanas wrote:

As I understand in the meeting, the remote workers will write the results in the pool directory in a NFS mount point.

No, the pool directory is not a NFS mount (most of I/O going there and not everything there you need to be shared, e.g. VM disk). Instead worker will write os-autoinst.log and ogv file to some other directory (e.g. like testresults, but probably we'll create new one to cope with apparmor permissions) which is an actual NFS mount.

The thing that I am not sure is the NFS limit. IMHO this make the deployment of openQA more complext that it is now. So why not use a REST service instead of NFS? For this the worker and / or os-autoinst will PUT / POST into a clear URL based on the path of the pool directory. For example, the worker #ID 1 now is writing in /var/openqa/pool/1/.. (or something like this). We can change that in a way that now the worker send PUT commands to http://SERVERIP/workers/pool/1/..

One thing was that I didn't want to make a huge changes to openQA (but that's personal, not technical issue). The other thing is to keep assets and tests synchronized. I had a demo where I had tests in git repo, worker pulls before each run. Then I added shasum for isos into DB, but computing one takes around minute at my machine so each time new iso is added shasum was computed and each time test started the iso was checked.
Yeah, there definitely are optimizations, but from my POV I imagine openQA cluster to be a closed one. In similar way as OBS IMO is, with private network and so on.

Actions

Copy link