Project

General

Profile

action #135944

Updated by nicksinger 8 months ago

## Motivation 
 We struggle to understand the multi-machine setup and how to set it up properly for newly added openQA machines. In addition to that we also have problems debugging these setups in cases like https://progress.opensuse.org/issues/134282 because they consist of a quite complex networking stack (advanced linux networking, openvswitch, gre tunnels between workers, openvswitch-osautoinst and KVM/QEMU on top of all of that, etc.) 

 Together with @pcervinka @okurz and @mkittler we discussed on 2023-09-18 in jitsi that it might be a good idea to have a qemu instance constantly running which is setup like a multimachine job but with a very basic installation. These VMs could be used to run e.g. telegraf with basic checks on top of the whole stack (ping, curl to different required sources like scc.suse.de, etc) and can be accessed by SSH to do debugging in case something is not working. 

 ## Acceptance criteria 
 * **AC1:** each multi-machine capable worker has a constantly running qemu instance connected to the multi-machine network-stack 
   * **AC1.1:** this setup is defined and configured via salt 
   * **AC1.2:** the VM starts on worker startup via a systemd unit 

 ## Suggestions 
 * Check what openQA executes to spawn VMs (e.g. with `ps` while a multi-machine job is running) 
 * Use the [minimal reproducer](https://progress.opensuse.org/issues/135818) as base 
 * Keep the [best practices](https://progress.opensuse.org/issues/135914) for multi-machine test debugging in mind 
 * Understand what [os-autoinst-openvswitch](https://github.com/os-autoinst/os-autoinst/blob/master/os-autoinst-openvswitch) does FIXME

Back