action #64700

setup o3 workers openqaworker4 and openqaworker7 for multi-machine tests

Added by okurz over 1 year ago. Updated about 1 year ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:




Related issues

Related to openQA Tests - action #64970: [desktop][opensuse][multi-machine] test fails in xrdp_client to connect to serverResolved2020-03-29


#1 Updated by okurz over 1 year ago

for testing on w7 which was previously already configured as MM worker for osd I added ",tap" to the worker class for :3 and :4 with

vim /etc/openqa/workers.ini
firewall-cmd --zone=trusted --add-masquerade
systemctl restart openqa-worker@{3..4}

successful tests:

so I restarted the other workers as well:

systemctl restart openqa-worker@{1..2} openqa-worker@{5..14}

I assume depending on and the history on aarch64 what we would need to do for w4 which seems to have been never configured for MM:

zypper -n --no-refresh in firewalld openvswitch os-autoinst-openvswitch libcap-progs
systemctl enable --now firewalld openvswitch os-autoinst-openvswitch
echo 'OS_AUTOINST_USE_BRIDGE=br1' > /etc/sysconfig/os-autoinst-openvswitch
ovs-vsctl add-br br1
cat > /etc/sysconfig/network/ifcfg-tap0 <<EOF
for i in {1..14} {64..77} {128..141}; do echo OVS_BRIDGE_PORT_DEVICE_$i=\'tap$i\' ; done >> /etc/sysconfig/network/ifcfg-br1
for i in {1..14} {64..77} {128..141}; do ln -s /etc/sysconfig/network/ifcfg-tap{0,$i} ; done
firewall-cmd --set-default-zone=trusted
firewall-cmd --zone=trusted --add-masquerade
for i in br1 eth0 ovs-system; do firewall-cmd --zone=trusted --add-interface=$i; done
firewall-cmd --runtime-to-permanent
setcap CAP_NET_ADMIN=ep /usr/bin/qemu-system-x86_64

#2 Updated by okurz over 1 year ago

  • Description updated (diff)

#3 Updated by okurz over 1 year ago

  • Related to action #64970: [desktop][opensuse][multi-machine] test fails in xrdp_client to connect to server added

#4 Updated by okurz over 1 year ago

apparently openqaworker7 is producing some problematic job results. E.g.

[28/03/2020 17:11:19] <DimStar> okurz: is more painful :)
[28/03/2020 17:11:36] <DimStar> success/failure ratio is far off
[28/03/2020 17:11:59] <DimStar> I thin 10 days ago is when we removed OW1, right?

Seems like desktopapps-remote-desktop-xrdp-client1 consistently does not work on openqaworker7 so test reviewers retrigger failed tests until it happens to be run on openqaworker1 which seems to be stable. DimStar also mentioned other problems, like , also on openqaworker7. Could be something special about the firewall maybe. " - firewall might be sonething..or dns config", also w7. I have disabled "tap" from worker class on openqaworker7 and restarted worker instances. Let's see if this helps. as an interesting example because it is not a multi-machine test. Maybe we can look into this one first, should be easier to crosscheck.

Also, what I saw as differences in configuration: On w1 only "br1" is in "trusted" zone, on w7 it's "br1 eth0 tap…", same on aarch64. Also the config differs in "STARTMODE" and the explicit "ZONE" in /etc/sysconfig/network/ifcfg-tap*

So now on w7 I did:

cat > /etc/sysconfig/network/ifcfg-tap0 <<EOF
> BOOTPROTO='none'
> STARTMODE='auto'
> TUNNEL='tap'
> TUNNEL_SET_GROUP='nogroup'
> TUNNEL_SET_OWNER='_openqa-worker'
> ZONE=public'
for i in {1..20} {64..83} {128..147}; do ln -sf /etc/sysconfig/network/ifcfg-tap{0,$i} ; done
for i in {0..20} {64..83} {128..147}; do firewall-cmd --zone-trusted --remove-interface=eth0; done
firewall-cmd --runtime-to-permanent

and looking into the "kubeadm" failure:

$ build=okurz_investigation_poo64700; for i in 1 7 ; do build=$build openqa-clone-set ${build}_kubeadm_w$i WORKER_CLASS=openqaworker$i; done

shows that 10/10 jobs on openqaworker1 and 10/10 jobs on openqaworker7 fail the same so I reject the hypothesis that it's something specific to the MM setup on openqaworker7.

After the above changes I triggered some jobs again:

$ openqa-clone-job --parental-inheritance --skip-chained-deps --within-instance WORKER_CLASS=openqaworker7 BUILD=X _
GROUP=0 TEST=okurz_poo64700_yast2_nfs_v4_server                                                                                                                                             

Created job #1219043: opensuse-Tumbleweed-DVD-x86_64-Build20200329-yast2_nfs_v4_server@64bit ->

as a single test out of a mm-pair which works fine on its own.

$ openqa-clone-job --parental-inheritance --skip-chained-deps --within-instance WORKER_CLASS=openqaworker7 BUILD=X _
GROUP=0 TEST=okurz_poo64700_yast2_nfs_v4_client

Created job #1219049: opensuse-Tumbleweed-DVD-x86_64-Build20200327-yast2_nfs_v4_server@64bit ->
Created job #1219050: opensuse-Tumbleweed-DVD-x86_64-Build20200327-yast2_nfs_v4_client@64bit ->

which fail in

But we check again the basics with wicked_basic:

$ openqa-clone-job --parental-inheritance --skip-chained-deps --within-instance WORKER_CLASS=openqaworker7 BUILD=X _GROUP=0 TEST=okurz_poo64700_wicked_basic_sut

Created job #1219103: opensuse-Tumbleweed-DVD-x86_64-Build20200329-wicked_basic_ref@64bit ->
Created job #1219104: opensuse-Tumbleweed-DVD-x86_64-Build20200329-wicked_basic_sut@64bit ->

failed. shows

# ping -c 1|| journalctl -b --no-pager > /dev/ttyS0; echo MWhDi-$?-
PING ( 56(84) bytes of data.
From icmp_seq=1 Destination Host Unreachable

TODO read older tickets to remind myself, e.g. #30892 , #52499 , #55043 , #31978

#5 Updated by okurz over 1 year ago

  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)

I did not progress over #64700#note-note-4 unfortunately. Didn't find time to refresh my memory with old setup.

#6 Updated by okurz over 1 year ago

  • Priority changed from Normal to Low

#7 Updated by okurz about 1 year ago

  • Target version set to future

Also available in: Atom PDF