Project

General

Profile

action #64700

setup o3 workers openqaworker4 and openqaworker7 for multi-machine tests

Added by okurz over 1 year ago. Updated 9 months ago.

Status:
Workable
Priority:
Low
Assignee:
-
Target version:
Start date:
2020-03-20
Due date:
% Done:

0%

Estimated time:

Description

references

#52499


Related issues

Related to openQA Tests - action #64970: [desktop][opensuse][multi-machine] test fails in xrdp_client to connect to serverResolved2020-03-29

History

#1 Updated by okurz over 1 year ago

for testing on w7 which was previously already configured as MM worker for osd I added ",tap" to the worker class for :3 and :4 with

vim /etc/openqa/workers.ini
firewall-cmd --zone=trusted --add-masquerade
systemctl restart openqa-worker@{3..4}

successful tests:

so I restarted the other workers as well:

systemctl restart openqa-worker@{1..2} openqa-worker@{5..14}

I assume depending on http://open.qa/docs/#_multi_machine_tests_setup and the history on aarch64 what we would need to do for w4 which seems to have been never configured for MM:

zypper -n --no-refresh in firewalld openvswitch os-autoinst-openvswitch libcap-progs
systemctl enable --now firewalld openvswitch os-autoinst-openvswitch
echo 'OS_AUTOINST_USE_BRIDGE=br1' > /etc/sysconfig/os-autoinst-openvswitch
ovs-vsctl add-br br1
cat > /etc/sysconfig/network/ifcfg-tap0 <<EOF
BOOTPROTO='none'
IPADDR=''
NETMASK=''
PREFIXLEN=''
STARTMODE='auto'
TUNNEL='tap'
TUNNEL_SET_GROUP='nogroup'
TUNNEL_SET_OWNER='_openqa-worker'
EOF
for i in {1..14} {64..77} {128..141}; do echo OVS_BRIDGE_PORT_DEVICE_$i=\'tap$i\' ; done >> /etc/sysconfig/network/ifcfg-br1
for i in {1..14} {64..77} {128..141}; do ln -s /etc/sysconfig/network/ifcfg-tap{0,$i} ; done
firewall-cmd --set-default-zone=trusted
firewall-cmd --zone=trusted --add-masquerade
for i in br1 eth0 ovs-system; do firewall-cmd --zone=trusted --add-interface=$i; done
firewall-cmd --runtime-to-permanent
setcap CAP_NET_ADMIN=ep /usr/bin/qemu-system-x86_64

#2 Updated by okurz over 1 year ago

  • Description updated (diff)

#3 Updated by okurz over 1 year ago

  • Related to action #64970: [desktop][opensuse][multi-machine] test fails in xrdp_client to connect to server added

#4 Updated by okurz over 1 year ago

apparently openqaworker7 is producing some problematic job results. E.g.

[28/03/2020 17:11:19] <DimStar> okurz: https://openqa.opensuse.org/tests/1216485#next_previous is more painful :)
[28/03/2020 17:11:36] <DimStar> success/failure ratio is far off
[28/03/2020 17:11:59] <DimStar> I thin 10 days ago is when we removed OW1, right?

Seems like desktopapps-remote-desktop-xrdp-client1 consistently does not work on openqaworker7 so test reviewers retrigger failed tests until it happens to be run on openqaworker1 which seems to be stable. DimStar also mentioned other problems, like https://openqa.opensuse.org/tests/1217710#step/kubeadm/1 , also on openqaworker7. Could be something special about the firewall maybe. " https://openqa.opensuse.org/tests/1217727#step/yast2_nfs4_server/37 - firewall might be sonething..or dns config", also w7. I have disabled "tap" from worker class on openqaworker7 and restarted worker instances. Let's see if this helps. https://openqa.opensuse.org/tests/1217710# as an interesting example because it is not a multi-machine test. Maybe we can look into this one first, should be easier to crosscheck.

Also, what I saw as differences in configuration: On w1 only "br1" is in "trusted" zone, on w7 it's "br1 eth0 tap…", same on aarch64. Also the config differs in "STARTMODE" and the explicit "ZONE" in /etc/sysconfig/network/ifcfg-tap*

So now on w7 I did:

cat > /etc/sysconfig/network/ifcfg-tap0 <<EOF
> BOOTPROTO='none'
> IPADDR=''
> NETMASK=''
> PREFIXLEN=''
> STARTMODE='auto'
> TUNNEL='tap'
> TUNNEL_SET_GROUP='nogroup'
> TUNNEL_SET_OWNER='_openqa-worker'
> ZONE=public'
> EOF
for i in {1..20} {64..83} {128..147}; do ln -sf /etc/sysconfig/network/ifcfg-tap{0,$i} ; done
for i in {0..20} {64..83} {128..147}; do firewall-cmd --zone-trusted --remove-interface=eth0; done
firewall-cmd --runtime-to-permanent

and looking into the "kubeadm" failure:

$ build=okurz_investigation_poo64700; for i in 1 7 ; do build=$build openqa-clone-set https://openqa.opensuse.org/tests/1217710 ${build}_kubeadm_w$i WORKER_CLASS=openqaworker$i; done

https://openqa.opensuse.org/tests/overview?build=okurz_investigation_poo64700

shows that 10/10 jobs on openqaworker1 and 10/10 jobs on openqaworker7 fail the same so I reject the hypothesis that it's something specific to the MM setup on openqaworker7.

After the above changes I triggered some jobs again:

$ openqa-clone-job --parental-inheritance --skip-chained-deps --within-instance https://openqa.opensuse.org/tests/1218529 WORKER_CLASS=openqaworker7 BUILD=X _
GROUP=0 TEST=okurz_poo64700_yast2_nfs_v4_server                                                                                                                                             

Created job #1219043: opensuse-Tumbleweed-DVD-x86_64-Build20200329-yast2_nfs_v4_server@64bit -> https://openqa.opensuse.org/t1219043

as a single test out of a mm-pair which works fine on its own.

$ openqa-clone-job --parental-inheritance --skip-chained-deps --within-instance https://openqa.opensuse.org/tests/1217787 WORKER_CLASS=openqaworker7 BUILD=X _
GROUP=0 TEST=okurz_poo64700_yast2_nfs_v4_client

Created job #1219049: opensuse-Tumbleweed-DVD-x86_64-Build20200327-yast2_nfs_v4_server@64bit -> https://openqa.opensuse.org/t1219049
Created job #1219050: opensuse-Tumbleweed-DVD-x86_64-Build20200327-yast2_nfs_v4_client@64bit -> https://openqa.opensuse.org/t1219050

which fail in https://openqa.opensuse.org/tests/1219050#step/yast2_nfs4_client/28

But we check again the basics with wicked_basic:

$ openqa-clone-job --parental-inheritance --skip-chained-deps --within-instance https://openqa.opensuse.org/tests/1218584 WORKER_CLASS=openqaworker7 BUILD=X _GROUP=0 TEST=okurz_poo64700_wicked_basic_sut

Created job #1219103: opensuse-Tumbleweed-DVD-x86_64-Build20200329-wicked_basic_ref@64bit -> https://openqa.opensuse.org/t1219103
Created job #1219104: opensuse-Tumbleweed-DVD-x86_64-Build20200329-wicked_basic_sut@64bit -> https://openqa.opensuse.org/t1219104

failed. https://openqa.opensuse.org/tests/1219104/file/serial_terminal.txt shows

# ping -c 1 10.0.2.2|| journalctl -b --no-pager > /dev/ttyS0; echo MWhDi-$?-
PING 10.0.2.2 (10.0.2.2) 56(84) bytes of data.
From 10.0.2.11 icmp_seq=1 Destination Host Unreachable

TODO read older tickets to remind myself, e.g. #30892 , #52499 , #55043 , #31978

#5 Updated by okurz over 1 year ago

  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)

I did not progress over #64700#note-note-4 unfortunately. Didn't find time to refresh my memory with old setup.

#6 Updated by okurz about 1 year ago

  • Priority changed from Normal to Low

#7 Updated by okurz 9 months ago

  • Target version set to future

Also available in: Atom PDF