Project

General

Profile

Actions

action #135944

open

openQA Project - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

openQA Project - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

Implement a constantly running monitoring/debugging VM for the multi-machine network

Added by nicksinger 9 months ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2023-09-18
Due date:
% Done:

0%

Estimated time:

Description

Motivation

We struggle to understand the multi-machine setup and how to set it up properly for newly added openQA machines. In addition to that we also have problems debugging these setups in cases like #134282 because they consist of a quite complex networking stack (advanced linux networking, openvswitch, gre tunnels between workers, openvswitch-osautoinst and KVM/QEMU on top of all of that, etc.)

Together with @pcervinka @okurz and @mkittler we discussed on 2023-09-18 in jitsi that it might be a good idea to have a qemu instance constantly running which is setup like a multimachine job but with a very basic installation. These VMs could be used to run e.g. telegraf with basic checks on top of the whole stack (ping, curl to different required sources like scc.suse.de, etc) and can be accessed by SSH to do debugging in case something is not working.

Acceptance criteria

  • AC1: each multi-machine capable worker has a constantly running qemu instance connected to the multi-machine network-stack
    • AC1.1: this setup is defined and configured via salt
    • AC1.2: the VM starts on worker startup via a systemd unit

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retryResolvednicksinger2023-08-15

Actions
Actions #1

Updated by okurz 9 months ago

  • Target version set to future
Actions #2

Updated by nicksinger 9 months ago

  • Description updated (diff)
  • Status changed from Blocked to New
  • Assignee deleted (nicksinger)
Actions #3

Updated by livdywan 9 months ago

  • Related to action #134282: [tools] network protocols failures on multimachine tests on HA/SAP size:S auto_review:"no candidate.*iscsi-target-overview-service-tab|yast2.+firewall.+services.+add.+zone":retry added
Actions #4

Updated by livdywan 9 months ago

  • Description updated (diff)
Actions #5

Updated by okurz 6 months ago

  • Parent task set to #111929
Actions

Also available in: Atom PDF