Project

General

Profile

Actions

action #178906

closed

coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

Support with broken MultiMachine setup size:S

Added by szarate 17 days ago. Updated 1 day ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Support
Target version:
Start date:
2025-03-14
Due date:
% Done:

0%

Estimated time:

Description

On a fresh Leap 15.6 install I am trying to setup a multi-machine cluster using this script as mentioned in the documentation: (machine setup is simple https://en.opensuse.org/openSUSE:OpenQA:Setup)

instances=2 ethernet=eth1 bash -x $(which os-autoinst-setup-multi-machine) (machine has two network interfaces)

Unfortunately after this the tests fail with:

 qemu-system-x86_64: -netdev tap,id=qanet0,ifname=tap21,script=no,downscript=no: could not configure /dev/net/tun (tap21): Operation not permitted

Also the firewall seems borked:

Error: RUNNING_BUT_FAILED: Changing permanent configuration is not allowed while firewalld is in FAILED state. The permanent configuration must be fixed and then firewalld restarted. Try `firewall-offline-cmd --check-config`.

Calling the command suggested above:

Configuration error: INVALID_INTERFACE: Zone 'public': interface 'eth1' already bound to zone 'trusted'

Suggestions

  • Redo the above steps to reproduce the issue or work with szarate to further investigate
  • Crosscheck with how we test os-autoinst-setup-multi-machine, e.g. in openQA-in-openQA tests https://openqa.opensuse.org/group_overview/24

Files


Related issues 1 (1 open0 closed)

Related to openQA Project (public) - action #159414: Ensure that os-autoinst-setup-multi-machine reliably sets firewall zones not interfering with /etc/sysconfig/network/ifcfg-* size:SWorkable2024-03-18

Actions
Actions #1

Updated by szarate 17 days ago

  • Subject changed from Support with MultiMachine setup to Support with broken MultiMachine setup
Actions #2

Updated by szarate 17 days ago

  • Description updated (diff)

A failing job: http://quake2.qe.nue2.suse.org/tests/5 more over the worker instances that I have restarted show now as unavailable: http://quake2.qe.nue2.suse.org/admin/workers/1

Note that I can reinstall if needed

Actions #3

Updated by szarate 17 days ago

  • Description updated (diff)
Actions #4

Updated by okurz 17 days ago

  • Tags set to reactive work
  • Category set to Support
  • Priority changed from Normal to High
  • Target version set to Ready
Actions #5

Updated by okurz 17 days ago

  • Parent task set to #111929
Actions #6

Updated by gpathak 14 days ago

  • Related to action #159414: Ensure that os-autoinst-setup-multi-machine reliably sets firewall zones not interfering with /etc/sysconfig/network/ifcfg-* size:S added
Actions #8

Updated by szarate 14 days ago

I tried setting it up with Wicked and network with both, Wicked and Network manager.

Currently the machine has wicked configured but can be reinstalled if needed

Actions #9

Updated by gpathak 14 days ago

szarate wrote:

On a fresh leap 15.6 install I am trying to setup a multi-machine cluster using this script as mentioned in the documentation: (machine setup is simple https://en.opensuse.org/openSUSE:OpenQA:Setup)

instances=2 ethernet=eth1 bash -x $(which os-autoinst-setup-multi-machine) (machine has two network interfaces)

unfortunately after this the tests fail with

 qemu-system-x86_64: -netdev tap,id=qanet0,ifname=tap21,script=no,downscript=no: could not configure /dev/net/tun (tap21): Operation not permitted

I found this exact error easily reproducible in case of NetworkManager.
I didn't get the firewall error though. Also, to me it seems like openQA-worker along with multi-machine installation and setup was done via official repositories

Actions #10

Updated by okurz 11 days ago

  • Subject changed from Support with broken MultiMachine setup to Support with broken MultiMachine setup size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #11

Updated by gpathak 7 days ago

  • Status changed from Workable to In Progress
  • Assignee set to gpathak
Actions #12

Updated by gpathak 7 days ago

Trying to reproduce the issue, performing a fresh openSUSE Leap 15.6 installation

Updated by gpathak 7 days ago · Edited

Unfortunately, this is not happening on my local setup I just did on a fresh Leap 15.6 installation, followed steps from https://en.opensuse.org/openSUSE:OpenQA:Setup.
The web-UI and workers are running on same machine in my case.

rsync-server:

rsync-client:

Actions #14

Updated by szarate 7 days ago · Edited

@gpathak Could you try to reproduce it on one of the quake machines I assume that there are some differences in setup

Actions #15

Updated by gpathak 7 days ago

szarate wrote in #note-14:

@gpathak Could you try to reproduce it on one of the quake machines I assume that there are some differences in setup

Sure, I will try to reproduce it on one of the quake machines.

Actions #16

Updated by openqa_review 6 days ago

  • Due date set to 2025-04-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #17

Updated by gpathak 6 days ago · Edited

I tried this on quake2.qe.nue2.suse.org, and the execution was successful: http://10.168.195.106/tests/13#dependencies

I didn't perform a fresh Leap 15.6 installation on quake2, instead I removed the packages and re-installed the PACKAGES, I have listed the commands that I executed (after some trial and error) for removing, re-installing and configuring openQA and multimachine setup

zypper rm openQA* os-autoinst* firewalld* openvswitch* apache2*
rm -rvf /etc/openqa/
rm -rvf /var/lib/openqa/
rm -rf /etc/firewalld
rm -rf /etc/openvswitch/
find / -name "*openqa*" | xargs rm -rvf
find / -name "*apache2*" | xargs rm -rvf
find / -name "*openqa*" | xargs rm -rvf
reboot

After the machine booted, logged in via ssh and followed https://en.opensuse.org/openSUSE:OpenQA:Setup

zypper ref
zypper in openQA-bootstrap firewalld firewalld-bash-completion firewalld-lang firewalld-test
systemctl enable --now firewalld.service
skip_suse_specifics=1 skip_suse_tests=1 /usr/share/openqa/script/openqa-bootstrap
firewall-cmd --zone=public --add-service=http --permanent
firewall-cmd --add-port=5991/tcp --permanent
firewall-cmd --add-port=5992/tcp --permanent

I don't know why the above command firewall-cmd --zone=public --add-service=http --permanent didn't work, I had to explicitly run:

firewall-cmd --permanent --add-port=80/tcp --zone=public
firewall-cmd --permanent --add-port=80/udp --zone=public
firewall-cmd --reload

Then added tap worker class in workers.ini after that:

systemctl enable --now openqa-worker-plain@{1..2}.service
systemctl restart openqa-worker-plain@{1..2}.service
instances=2 ethernet=eth0 bash -x $(which os-autoinst-setup-multi-machine)
systemctl restart os-autoinst-openvswitch.service
wicked ifup all
openqa-clone-job --skip-chained-deps --show-progress https://openqa.opensuse.org/tests/4942922

The multimachine test didn't pass in first attempt.

I have to run systemctl restart os-autoinst-openvswitch.service openvswitch.service and then wicked ifup all to get rid of this error Open vSwitch command 'set_vlan' with arguments 'tap1 1' failed: org.freedesktop.DBus.Error.ServiceUnknown: The name org.opensuse.os_autoinst.switch was not provided by any .service files: http://10.168.195.106/tests/10#line-68
then I have to install ffmpeg-4 to make it work http://10.168.195.106/tests/9#line-46 and fix this error Can't exec "ffmpeg": No such file or directory at /usr/lib/os-autoinst/backend/baseclass.pm line 348, <$fh> line 16.

But I didn't get any of the error:

qemu-system-x86_64: -netdev tap,id=qanet0,ifname=tap21,script=no,downscript=no: could not configure /dev/net/tun (tap21): Operation not permitted

Error: RUNNING_BUT_FAILED: Changing permanent configuration is not allowed while firewalld is in FAILED state. The permanent configuration must be fixed and then firewalld restarted. Try firewall-offline-cmd --check-config.

Configuration error: INVALID_INTERFACE: Zone 'public': interface 'eth1' already bound to zone 'trusted'

Actions #18

Updated by gpathak 6 days ago

PLease let me know how can I perform a fresh Leap 15.6 installation on quake2 and then I can perform these steps mentioned in the wiki on quake2 from scratch

Actions #19

Updated by gpathak 6 days ago

gpathak wrote in #note-18:

PLease let me know how can I perform a fresh Leap 15.6 installation on quake2 and then I can perform these steps mentioned in the wiki on quake2 from scratch

Did a fresh Leap 15.6 installation on quake2 (with wicked), will be performing these setup steps: https://en.opensuse.org/openSUSE:OpenQA:Setup

Actions #20

Updated by szarate 6 days ago

gpathak wrote in #note-19:

gpathak wrote in #note-18:

PLease let me know how can I perform a fresh Leap 15.6 installation on quake2 and then I can perform these steps mentioned in the wiki on quake2 from scratch

Did a fresh Leap 15.6 installation on quake2 (with wicked), will be performing these setup steps: https://en.opensuse.org/openSUSE:OpenQA:Setup

Cool, thanks @gpathak lmk how it goes

Actions #21

Updated by gpathak 6 days ago

szarate wrote in #note-20:

gpathak wrote in #note-19:

gpathak wrote in #note-18:

PLease let me know how can I perform a fresh Leap 15.6 installation on quake2 and then I can perform these steps mentioned in the wiki on quake2 from scratch

Did a fresh Leap 15.6 installation on quake2 (with wicked), will be performing these setup steps: https://en.opensuse.org/openSUSE:OpenQA:Setup

Cool, thanks @gpathak lmk how it goes

Indeed, I got this error:

Error: RUNNING_BUT_FAILED: Changing permanent configuration is not allowed while firewalld is in FAILED state. The permanent configuration must be fixed and then firewalld restarted. Try `firewall-offline-cmd --check-config`.
Actions #24

Updated by gpathak 5 days ago · Edited

Verified with the changes made recently in the MR for review comments.
MM tests doesn't complain could not configure /dev/net/tun (tap21): Operation not permitted, os-autoinst-setup-multi-machine also takes care of setting interface in correct firewalld zone

Test execution result: http://quake2.qe.nue2.suse.org/tests/2

Actions #25

Updated by gpathak 4 days ago

  • Status changed from In Progress to Feedback
Actions #26

Updated by szarate 4 days ago

Thanks for looking into this, I will reinstall quake2 and report back whether there are more issues or if everything was running fine

Actions #27

Updated by gpathak 4 days ago

szarate wrote in #note-26:

Thanks for looking into this, I will reinstall quake2 and report back whether there are more issues or if everything was running fine

You can re-install OS on quake2, but the MR is not yet approved and merged, I tested these changes locally.

Actions #28

Updated by gpathak 1 day ago

  • Status changed from Feedback to Resolved
Actions #29

Updated by okurz 1 day ago

  • Due date deleted (2025-04-08)
Actions

Also available in: Atom PDF