Project

General

Profile

Actions

action #4680

closed

action #3136: support remote workers

livestream support for remote workers

Added by oholecek over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2014-11-13
Due date:
% Done:

0%

Estimated time:

Description

Test livestream does not work on current remote worker draft.
I don't understand the code well enough but I guess it tries to read directly from pool directory. This directory is however not shared across hosts and present only on worker machine. I'm quite hesitant to share pools due to majority of I/O happening here.

What about proxying r/o VNC streams? Does not solve live log though.

Actions #1

Updated by coolo over 9 years ago

proxying is exactly what OBS does. The OBS workers start a daemon and tell its port to the system. And when OBS gets a request for a log file of a currently building job, it will proxy that to the worker building it atm.

I would actually prefer that model to the current polling of /commands

Actions #2

Updated by oholecek over 9 years ago

  • Status changed from New to In Progress

I spent same time looking into this and Mojolicious and now I need some opinions.
First thing is if we want workers to remain pull based = they ask for commands rather than listening for commands from Scheduler. I currently work with this assumption so I have introduced four more commands (livelog_start/stop, vnc_start/stop).
I moved log reading functionality to worker script and now I need this data to get to Scheduler so it can be proxied to client. Any suggestions?

I'm not yet very familiar with Mojolicious so in my current demo (concerning only livelog feature) I just create Mojo::IOLoop::Server and listen on port $baseport+$worker_id. And print to client everything I get from there. I kind of feel this is not very much Mojo-way, but seem straightforward. Also I did not yet measure CPU usage and check performance.

Actions #3

Updated by coolo over 9 years ago

  • Assignee set to oholecek
  • Target version set to Sprint 12
Actions #4

Updated by coolo over 9 years ago

this whole architecture feels very weired to me. Not your fault, the whole polling in the workers is awkward. If you're in the office in 9:30, drop into https://plus.google.com/hangouts/_/event/cb8hf9s89esamj86qhvq07v9ons?authuser=0

Actions #5

Updated by aplanas over 9 years ago

As I understand in the meeting, the remote workers will write the results in the pool directory in a NFS mount point. The good thing about this is that the server (mojo) will not change at all, it will still read from a directory when a user connect to the live stream of a test. And this is very good.

The thing that I am not sure is the NFS limit. IMHO this make the deployment of openQA more complext that it is now. So why not use a REST service instead of NFS? For this the worker and / or os-autoinst will PUT / POST into a clear URL based on the path of the pool directory. For example, the worker #ID 1 now is writing in /var/openqa/pool/1/.. (or something like this). We can change that in a way that now the worker send PUT commands to http://SERVERIP/workers/pool/1/..

The good things that I see:

  • Easy deployment.
  • Workers do not need to live in the same network, and there is not need to open firewalls to NFS/RPC services
  • Cheap in resources. The PUT service is a simple process / thread in mojo (or apache)
  • Secure: there is a single writer in the directory
  • Easy to implement. Mojo will still read from a single directory, so the basic delivery arch do not change
  • Simple model, and more easy to understand
Actions #6

Updated by coolo over 9 years ago

the decision for network file system is already done

Actions #7

Updated by coolo over 9 years ago

see #3136

Actions #8

Updated by oholecek over 9 years ago

aplanas wrote:

As I understand in the meeting, the remote workers will write the results in the pool directory in a NFS mount point.

No, the pool directory is not a NFS mount (most of I/O going there and not everything there you need to be shared, e.g. VM disk). Instead worker will write os-autoinst.log and ogv file to some other directory (e.g. like testresults, but probably we'll create new one to cope with apparmor permissions) which is an actual NFS mount.

The thing that I am not sure is the NFS limit. IMHO this make the deployment of openQA more complext that it is now. So why not use a REST service instead of NFS? For this the worker and / or os-autoinst will PUT / POST into a clear URL based on the path of the pool directory. For example, the worker #ID 1 now is writing in /var/openqa/pool/1/.. (or something like this). We can change that in a way that now the worker send PUT commands to http://SERVERIP/workers/pool/1/..

One thing was that I didn't want to make a huge changes to openQA (but that's personal, not technical issue). The other thing is to keep assets and tests synchronized. I had a demo where I had tests in git repo, worker pulls before each run. Then I added shasum for isos into DB, but computing one takes around minute at my machine so each time new iso is added shasum was computed and each time test started the iso was checked.
Yeah, there definitely are optimizations, but from my POV I imagine openQA cluster to be a closed one. In similar way as OBS IMO is, with private network and so on.

Actions #9

Updated by coolo over 9 years ago

the remaining issues:

  • the /tests route shows the tests always in pre- or post-processing while they're running
  • the live build log is very slow
Actions #10

Updated by oholecek over 9 years ago

Live log issue should be solved. https://github.com/os-autoinst/openQA/pull/84
From my tests it seems live now. :)

Actions #11

Updated by coolo over 9 years ago

  • Category set to 124
  • Status changed from In Progress to Resolved

it's live now - but we need to redo it I'm afraid ;(

Actions

Also available in: Atom PDF