action #135944
openopenQA Project (public) - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens
openQA Project (public) - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers
Implement a constantly running monitoring/debugging VM for the multi-machine network
0%
Description
Motivation¶
We struggle to understand the multi-machine setup and how to set it up properly for newly added openQA machines. In addition to that we also have problems debugging these setups in cases like #134282 because they consist of a quite complex networking stack (advanced linux networking, openvswitch, gre tunnels between workers, openvswitch-osautoinst and KVM/QEMU on top of all of that, etc.)
Together with @pcervinka @okurz and @mkittler we discussed on 2023-09-18 in jitsi that it might be a good idea to have a qemu instance constantly running which is setup like a multimachine job but with a very basic installation. These VMs could be used to run e.g. telegraf with basic checks on top of the whole stack (ping, curl to different required sources like scc.suse.de, etc) and can be accessed by SSH to do debugging in case something is not working.
Acceptance criteria¶
- AC1: each multi-machine capable worker has a constantly running qemu instance connected to the multi-machine network-stack
- AC1.1: this setup is defined and configured via salt
- AC1.2: the VM starts on worker startup via a systemd unit
Suggestions¶
- Check what openQA executes to spawn VMs (e.g. with
ps
while a multi-machine job is running) - Use the minimal reproducer as base
- Keep the best practices for multi-machine test debugging in mind
- Understand what os-autoinst-openvswitch does