action #20526: [tools][openqa][research] Research on Federated openQA - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #20526

closed

[tools][openqa][research] Research on Federated openQA

Added by szarate over 7 years ago. Updated over 5 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

szarate

Category:

Feature requests

Target version:

Milestone 9

Start date:

2017-11-20

Due date:

% Done:

100%

Estimated time:

(Total: 0.00 h)

Description

In order to support testing with openQA from different physical locations, we will require to do some research on the possible ways to implement a federated openQA.

The idea would be to support scenarios 4, 2 and 1 (order of priority) defined in poo#20514

Subtasks 3 (0 open — 3 closed)

Actions

Copy link

Updated by szarate over 7 years ago

For now the initial idea discussed with @coolo, was to have a worker on a second instance picking up jobs, and adding them to it's own master.

I believe that for this we would a fair number of bits/features in openQA, but mainly:

The possibility for the scheduler on the master webUI, to know which jobs need to be ran on a specific cluster), and the master webUI should handle these jobs as asynchronous or something similar
Each cluster, will have it's own scheduler/worker (I'd prefer to give this responsability to the scheduler) checking constantly the master webUI for new builds+jobs
When new builds are detected, they are synced based on capabilities (i.e, Location supports ISOs, QCOW2, repos)
Jobs that have been added to a second cluster, will only be triggered when the build has been synced
Once a job is finished, the master webUI gets a notification, along with the job results (could be bulk or per build?)

Actions

Copy link

Updated by szarate over 7 years ago

I also came across the achitecture of Apache Mesos, which spawned couple of ideas, but mainly the idea looking into zookeeper

Actions

Copy link

Updated by coolo over 7 years ago

The possibility for the scheduler on the master webUI, to know which jobs need to be ran on a specific cluster), and the master webUI should handle these jobs as asynchronous or something similar

This can be done through worker classes I believe.

As first approach really make both openqa instances ignorant to the concept and try to have all business logic in a briding worker - grabing jobs for X worker classes, syncing, scheduling them to another instance, wait for result, send them back.

And then let's see what support we need. But the idea is not to make openqa job scheduling even more complex

Actions

Copy link

Updated by coolo over 7 years ago

Please! Stop googling random key words - KISS!

Actions

Copy link

Updated by szarate over 7 years ago

Status changed from New to Resolved

The research was done and sparked few questions, but first things first:

While working on this, in conversations with @coolo,the idea of creating a worker_bridge came into play and resulted into the following PR: https://github.com/os-autoinst/openQA/pull/1414
- This approach takes for granted that a slaveUI (openQA instance present in a separate location), has a similar setup as the openQA instance that we have in production.
- The worker bridge will run on a machine and will have access to a masterUI and to a slaveUI (1 worker bridge = 1 slave), and will query the masterUI for jobs that have WORKER_CLASS=:my_location:our_worker_class
- Once the worker_bridge finds jobs on the masterUI that belong to it's instance, it will clone them and add proxied and federated_report to the job settings before posting it to the slaveUI
- The masterUI should have the possibility to filter job list by job setting, avoid generating extra load when searching for jobs
- The worker bridge will monitor jobs on the slaveUI that have the job setting proxied, or use federated_report and report progress to it's masterUI (stored in federated_report), when the job is done, regardless of the state.

There is some progress on few of this areas:

Add support for colons on worker class: https://github.com/os-autoinst/openQA/pull/1408
Add support for getting test results as json on the job json api: https://github.com/os-autoinst/openQA/pull/1424

Important things to know:

Reporting of job status to the masterUI is not finished
The worker bridge needs a refactor, as it was created as a proof of concept
/test api route needs to display all test results properly.
There should be some kind of transactionability, in case the clonning of jobs, fails...

Main question right now is: When is the worker_bridge triggered or when should it start working to pull jobs from the masterUI. In a conversation with @coolo in the last review, he suggested that the worker_bridge downloads the assets.

Actions

Copy link

Updated by szarate over 7 years ago

Due date set to 2017-11-20
Start date changed from 2017-09-14 to 2017-11-20

due to changes in a related task

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #20526

[tools][openqa][research] Research on Federated openQA

Updated by szarate over 7 years ago

Updated by szarate over 7 years ago

Updated by coolo over 7 years ago

Updated by coolo over 7 years ago

Updated by szarate over 7 years ago

Updated by szarate over 7 years ago