Project

General

Profile

Actions

action #135035

closed

coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

Optionally restrict multimachine jobs to a single worker

Added by apappas 8 months ago. Updated 24 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2023-09-01
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Multi-machine jobs have been failing since 20230814, because of a misconfiguration of the MTU/GRE tunnels. A workaround has been found in forcing the complete multi-machine tests to run in the same worker.

The purpose of this ticket is to have all multi-machine runs be scheduled on the same well-configured worker.

The change doesn't need to be permanent but it does need to be applied until proper networking between multi-machine nodes can be guaranteed.

Acceptance Criteria

  • AC1: If configured accordingly all jobs of a multi-machine parallel cluster must be scheduled to run on the same worker host
  • AC2: By default jobs of a multi-machine parallel cluster can still be scheduled covering multiple different hosts

Suggestions


Related issues 6 (5 open1 closed)

Related to openQA Infrastructure - action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retryResolvednicksinger2023-08-15

Actions
Related to openQA Infrastructure - action #150869: Ensure multi-machine tests work on aarch64-o3 (or another but single machine only) size:MBlockedmkittler

Actions
Related to openQA Project - coordination #157144: [epic] Groups of worker classes: Regions, locations, etc.New2024-03-13

Actions
Related to openQA Project - action #112001: [timeboxed:20h][spike solution] Pin multi-machine cluster jobs to same openQA worker host based on configurationNew2022-06-03

Actions
Copied to openQA Project - action #152737: Support for triggering parallel (multi-machine-)tests within a configured zone or locationNew

Actions
Copied to openQA Project - action #158143: Make workers unassign/reject/incomplete jobs when across-host multimachine setup is requested but not availableNew

Actions
Actions

Also available in: Atom PDF