coordination #157144
opencoordination #103944: [saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance
[epic] Groups of worker classes: Regions, locations, etc.
100%
Updated by okurz 9 months ago ยท Edited
brainstorming with liv and mkittler, from https://etherpad.opensuse.org/p/suse_qe_tools
Some ideas playing around with worker classes, worker class tags, additional qualifiers/operators and experimenting how different variables would sound like:
PARALLEL_ONE_TAG_ONLY=host - PARALLEL_ONE_TAG_ONLY=region
or
WORKER_CLASS=only:host,only:region
or
WORKER_CLASS=parallel:host,parallel:region
?
Trying to use "not" but combine with worker class tags on a way to find a better concept covering "only" as well in a later step
Exclude datacenter prg1
WORKER_CLASS=-datacenter:prg1 (<-- mkittler prefers this but it does not give us a template for how to implement "only")
or
WORKER_CLASS=!datacenter:prg1 (not good causing execution errors in bash and also github prefers "-")
or
WORKER_CLASS=not datacenter:prg1
or
WORKER_CLASS=not@datacenter:prg1
or
WORKER_CLASS=not#datacenter:prg1
now trying to extend that concept further to include "only"
WORKER_CLASS=@host,@region (but hard to understand what the "@" means)
or
WORKER_CLASS=@only:host (we are reusing ":" inconsistently now to separate which tag to limit to, not to separate tag and class but at least we prefix it with "@")
or
WORKER_CLASS=only_one#host (defining "only_one" as restrictor separated from a tag with "#", not to be confused with a tag)
qemu multi-machine; must run in prg because there is another ressource we need, e.g. an external service; but we only want to run on a single host because we don't trust GRE (temporary workaround of course!)
1.
WORKER_CLASS=qemu_x86_64,tap,region-prg
PARALLEL_ONE_HOST_ONLY=1
- WORKER_CLASS=qemu_x86_64,tap,region-prg,only:one_host
alternative:
WORKER_CLASS=qemu_x86_64,tap,region-prg
WORKER_CLASS_RESTRICTIONS=parallel_one_host_only,?WORKER_CLASS=qemu_x86_64,only:one_region
parallel cluster being highly performant dependant so we don't care where it runs but we want to stay within one region where we assume bandwidth is better:
- server: WORKER_CLASS=qemu_x86_64,tap
- client: PARALLEL_WITH=server WORKER_CLASS=qemu_x86_64,tap # ONE_REGION_ONLY=1 <== prescribes a definition of "region" in openQA which we want to keep generic
WORKER_CLASS=qemu_x86_64,only:region-*
We want to schedule tests in any production datacenter in prg region because we need higher performance to a prague external service but prg1 has no IPv6 so we need to exclude that
WORKER_CLASS=qemu_x68_64,tap,region:prg,-datacenter:prg1
Our definitions:
- "qemu_x86_64" is a worker class
- "region:prg" is a worker class for additional classification
- "region" is a worker class tag. Generalization: "[:]*" followed by ":" is a worker class tag
- "not" and "only" and "only_one" should be treated as
reserved names for worker class tagsrestrictors.Example: "not:datacenter-prg1" (or "only:region-nue" <= does not make sense as that is the same as just specifying "region:nue"). GitHub uses a "-" prefix, e.g. "-datacenter:prg1" - "host:worker40" is a worker class for additional classification. Potentially automatically filled by the worker
- restrictors and worker classes or worker class tags are combined with "#". Examples: "not#datacenter:prg1", "only_one#host"
Updated by okurz 9 months ago
- Related to action #135035: Optionally restrict multimachine jobs to a single worker added