Project

General

Profile

Actions

coordination #157144

open

coordination #103944: [saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance

[epic] Groups of worker classes: Regions, locations, etc.

Added by okurz 10 months ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2024-03-13
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Subtasks 1 (0 open1 closed)

action #157147: Documentation for OSD worker region, location, datacenter keys in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls size:SResolvedmkittler2024-03-13

Actions

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #135035: Optionally restrict multimachine jobs to a single workerResolvedmkittler2023-09-01

Actions
Actions #1

Updated by okurz 10 months ago

  • Subtask #157147 added
Actions #2

Updated by okurz 9 months ago ยท Edited

brainstorming with liv and mkittler, from https://etherpad.opensuse.org/p/suse_qe_tools

Some ideas playing around with worker classes, worker class tags, additional qualifiers/operators and experimenting how different variables would sound like:
PARALLEL_ONE_TAG_ONLY=host - PARALLEL_ONE_TAG_ONLY=region
or
WORKER_CLASS=only:host,only:region
or
WORKER_CLASS=parallel:host,parallel:region
?

Trying to use "not" but combine with worker class tags on a way to find a better concept covering "only" as well in a later step
Exclude datacenter prg1
WORKER_CLASS=-datacenter:prg1 (<-- mkittler prefers this but it does not give us a template for how to implement "only")
or
WORKER_CLASS=!datacenter:prg1 (not good causing execution errors in bash and also github prefers "-")
or
WORKER_CLASS=not datacenter:prg1
or
WORKER_CLASS=not@datacenter:prg1
or
WORKER_CLASS=not#datacenter:prg1

now trying to extend that concept further to include "only"
WORKER_CLASS=@host,@region (but hard to understand what the "@" means)
or
WORKER_CLASS=@only:host (we are reusing ":" inconsistently now to separate which tag to limit to, not to separate tag and class but at least we prefix it with "@")
or
WORKER_CLASS=only_one#host (defining "only_one" as restrictor separated from a tag with "#", not to be confused with a tag)

qemu multi-machine; must run in prg because there is another ressource we need, e.g. an external service; but we only want to run on a single host because we don't trust GRE (temporary workaround of course!)
1.
WORKER_CLASS=qemu_x86_64,tap,region-prg
PARALLEL_ONE_HOST_ONLY=1

  1. WORKER_CLASS=qemu_x86_64,tap,region-prg,only:one_host
  2. alternative:
    WORKER_CLASS=qemu_x86_64,tap,region-prg
    WORKER_CLASS_RESTRICTIONS=parallel_one_host_only,?

  3. WORKER_CLASS=qemu_x86_64,only:one_region

  4. parallel cluster being highly performant dependant so we don't care where it runs but we want to stay within one region where we assume bandwidth is better:

    • server: WORKER_CLASS=qemu_x86_64,tap
    • client: PARALLEL_WITH=server WORKER_CLASS=qemu_x86_64,tap # ONE_REGION_ONLY=1 <== prescribes a definition of "region" in openQA which we want to keep generic
  5. WORKER_CLASS=qemu_x86_64,only:region-*

  6. We want to schedule tests in any production datacenter in prg region because we need higher performance to a prague external service but prg1 has no IPv6 so we need to exclude that
    WORKER_CLASS=qemu_x68_64,tap,region:prg,-datacenter:prg1

Our definitions:

  • "qemu_x86_64" is a worker class
  • "region:prg" is a worker class for additional classification
  • "region" is a worker class tag. Generalization: "[:]*" followed by ":" is a worker class tag
  • "not" and "only" and "only_one" should be treated as reserved names for worker class tags restrictors. Example: "not:datacenter-prg1" (or "only:region-nue" <= does not make sense as that is the same as just specifying "region:nue") . GitHub uses a "-" prefix, e.g. "-datacenter:prg1"
  • "host:worker40" is a worker class for additional classification. Potentially automatically filled by the worker
  • restrictors and worker classes or worker class tags are combined with "#". Examples: "not#datacenter:prg1", "only_one#host"
Actions #3

Updated by okurz 9 months ago

  • Related to action #135035: Optionally restrict multimachine jobs to a single worker added
Actions

Also available in: Atom PDF