action #52499

[aarch64] Proper multi-machine test setup and wicked_basic successfully tested (was: wicked tests always in schedule state - tap worker required)

Added by ggardet_arm 5 months ago. Updated 2 months ago.

Status:ResolvedStart date:03/06/2019
Priority:NormalDue date:
Assignee:okurz% Done:

0%

Category:Infrastructure
Target version:-
Difficulty:
Duration:

Description

wicked tests: https://openqa.opensuse.org/tests/946871#settings never reach running state.
I think this is because this test now requires a TAP worker.
So, aarch64 worker needs to be updated to handle TAP properly and worker class also need to be updated.

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #51635: [network] test fails in t08_setup_second_card Resolved 20/05/2019
Related to openQA Tests - action #54281: [aarch64] test fails in wicked before_test - DNS problem Resolved 15/07/2019
Related to openQA Infrastructure - action #54785: tap devices not in any zone, error reported by firewalld Workable 29/07/2019
Blocked by openQA Tests - action #52559: [network] test fails in t01_basic to ping the other node Resolved 04/06/2019

History

#1 Updated by okurz 5 months ago

  • Category set to Infrastructure
  • Assignee set to asmorodskyi

@asmorodskyi, seems like you changed the test but the worker does not have "tap", is it?

#2 Updated by okurz 5 months ago

  • Status changed from New to Feedback
  • Assignee changed from asmorodskyi to okurz
[03/06/2019 13:57:32] <okurz> riafarov: did *you* add `WORKER_CLASS=qemu_x86_64,tap` to wicked_basic_sut+wicked_basic_ref on o3? this is what the auditlog tells me however that does not sound like it would make any sense when we want to run it on aarch64 for example :)
[03/06/2019 13:58:41] <riafarov> okurz: yes I did that after talking to Anton
[03/06/2019 13:58:55] <riafarov> okurz: it's MM, so it's wrong to run it on aarch64
[03/06/2019 13:59:02] <okurz> riafarov: why?
[03/06/2019 13:59:08] <riafarov> okurz: and we have no idea how it managed to work there
[03/06/2019 13:59:37] <riafarov> okurz: do we have MM setup for arm?
[03/06/2019 14:01:28] <riafarov> okurz: for me it sounds like we should unschedule it for aarch64 then
[03/06/2019 14:01:33] <okurz> riafarov: Depends on what exactly qualifies as "MM setup" :) asmorodskyi and me also talked today and we agreed that probably "wicked_basic" relies on "basic multimachine" only – whatever that means but it works ;) So I will change it back and check that it properly works. Depending on when we add something like "wicked_advanced" we might see 
[03/06/2019 14:01:33] <okurz> what's missing
[03/06/2019 14:02:40] <riafarov> okurz: it incompletes every time when executed on the wrong worker
[03/06/2019 14:02:58] <riafarov> okurz: RMs di great job to retrigger it 5 times before it happens
[03/06/2019 14:03:05] <okurz> riafarov: ok, I will check that. 6 days ago it was fine though: https://openqa.opensuse.org/tests/944262#
[03/06/2019 14:03:23] <riafarov> okurz:  https://openqa.suse.de/tests/2943106/#step/boot_to_desktop/2
[03/06/2019 14:03:42] <riafarov> okurz: again, no idea how it manages to work
[03/06/2019 14:04:09] <riafarov> okurz: I've changed test suite setting to what is reasonable. If Anton is fine with your changes, feel free to revert
[03/06/2019 14:04:37] <okurz> riafarov: the example you mentioned was running covering openqaworker6+8, maybe an issue with the GRE tunnel which would allow "distributed multi-machine". When we are staying on the same host we should be fine
[03/06/2019 14:06:15] <riafarov> okurz: do not exclude 64bit runs from the equation
[03/06/2019 14:06:31] <riafarov> okurz: openqaworker4 doesn't support the scenario either
[03/06/2019 14:07:14] <riafarov> okurz: or openqaworker1 (do not rememeber which has tap device)
[03/06/2019 14:07:31] <riafarov> okurz: so in case you want to revert, also remove NICTYPE setting from the test suite
[03/06/2019 14:07:50] <riafarov> okurz: https://openqa.opensuse.org/tests/938212# here is failure on 64bit
[03/06/2019 14:08:38] <okurz> riafarov: yes, I get it know. ok, I will remove NICTYPE

So I set WORKER_CLASS=tap and removed NICTYPE=tap from both test suites "wicked_basic_sut" and "wicked_basic_ref".

Triggered for testing:

build=20190601; openqa-client --host https://openqa.opensuse.org isos post _NO_OBSOLETE_BUILD=1 ARCH=aarch64 BUILD=$build DISTRI=opensuse FLAVOR=DVD ISO=openSUSE-Tumbleweed-DVD-aarch64-Snapshot$build-Media.iso MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo REPO_0=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_0_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo REPO_OSS=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build VERSION=Tumbleweed TEST=wicked_basic_ref,wicked_basic_sut

->

{
  count => 3,
  failed => [],
  ids => [947700, 947701, 947702],
  scheduled_product_id => 109314,
}

so waiting for https://openqa.opensuse.org/tests/947702

#3 Updated by okurz 5 months ago

  • Subject changed from [aarch64] wicked tests always in schedule state - tap worker required to [aarch64] Proper multi-machine test setup and wicked_basic successfully tested (was: wicked tests always in schedule state - tap worker required)
  • Assignee changed from okurz to asmorodskyi

SUT failed to reach the parallel node, probably schrödinbug. We (asmorodskyi, riafarov, me) do not know why it could have ever worked as the aarch64 host does not have openvswitch configured which we probably need. asmorodskyi does not plan to support wicked_* in the near future. I have removed the scenarios from the aarch64 Tumbleweed as well as aarch64 Leap 15 for now.

However wicked_basic on x86_64 fails with what looks like the same problem: https://openqa.opensuse.org/tests/948247#step/t01_basic/420 so can you please look into that?

#4 Updated by ggardet_arm 5 months ago

Why do you remove it from aarch64? Especially when x86_64 fails in the same way.
Wicked is an important part the testing of aarch64, so we should keep it.
We may need to configure aarch64 worker properly to support it.

#5 Updated by okurz 5 months ago

ggardet_arm wrote:

We may need to configure aarch64 worker properly to support it.

Yes, this is what the subject line states.

#6 Updated by okurz 5 months ago

  • Blocked by action #52559: [network] test fails in t01_basic to ping the other node added

#7 Updated by asmorodskyi 5 months ago

while working on this issue keep in mind #51635 , "proper" setup for wicked tests ( even basic one ) actually means not just tap0,tap1,tap2,tap3 but also tapX+64 ( e.g. tap64,tap65,tap66,tap67 ) because we using TWO interfaces in all extended tests and one test case in wicked basic ( t08 )

#8 Updated by asmorodskyi 4 months ago

  • Status changed from Feedback to Workable

this ticket at the moment is about pure OPS task - setup MM for openQA in aarch64 worker . From tests perspective everything working as expected so I am sign off from this issue for now . It would wait for volunteer who will dare setup MM on o3 aarch64 workers

#9 Updated by asmorodskyi 4 months ago

  • Assignee deleted (asmorodskyi)

#10 Updated by okurz 4 months ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Priority changed from Urgent to Low

I guess we should live with the fact that the wicked tests where never workable on aarch64 as the setup is incomplete so I am arguing that the issue should not be "Urgent". Taking it, reducing prio and waiting for blocker.

#11 Updated by okurz 4 months ago

  • Status changed from Blocked to In Progress
  • Priority changed from Low to Normal

ggardet asked me again if we can not do it sooner, fine :)

So I debugged with asmorodskyi what is the problem on openqaworker1 and we found some mis-configuration in the firewall. The default-zone was set to "trusted" however masquerading and the bridge were not in "trusted". Fixed that and adjusted the documentation.

I followed http://open.qa/docs/#_tap_based_network to install necessary steps on aarch64.o.o
One problem that was probably caused by this is that the live mode could not connect anymore. By temporarily disabling the firewall I could identify this as the culprit.
What seems to have happened is that we explicitly added br1 to the zone "external" whereas on openqaworker1 it is "default" which seems to be more permissive and allow the live handler connections.

to support multi-nic tests:

for i in {64..69}; do ln -s /etc/sysconfig/network/ifcfg-tap{0,$i} ; done
for i in {128..133}; do ln -s /etc/sysconfig/network/ifcfg-tap{0,$i} ; done

I wonder if it would need to be real files and not symlinks?

#12 Updated by okurz 4 months ago

  • Description updated (diff)
  • Status changed from In Progress to Blocked

waiting for asmorodskyi to do the debugging in #52559 and #51635

#13 Updated by okurz 4 months ago

  • Related to action #51635: [network] test fails in t08_setup_second_card added

#14 Updated by ggardet_arm 3 months ago

  • Related to action #54281: [aarch64] test fails in wicked before_test - DNS problem added

#15 Updated by okurz 3 months ago

Trying out something for debugging if we can actually ping the nameserver we configured:

openqa-clone-custom-git-refspec https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8044 https://openqa.opensuse.org/tests/991642

Created job #994813: opensuse-Tumbleweed-DVD-aarch64-Build20190724-wicked_basic_ref@aarch64 -> https://openqa.opensuse.org/t994813

nsinger and me assume that something with NAT (including firewall) is off as the tap devices and the bridge looks fine but there is no traffic outside the virtual network.

#16 Updated by ggardet_arm 3 months ago

okurz wrote:

nsinger and me assume that something with NAT (including firewall) is off as the tap devices and the bridge looks fine but there is no traffic outside the virtual network.

The physical interface is connected to the same bridge?

Maybe ip a and ip route from host may bring some lights.

#17 Updated by okurz 3 months ago

  • Related to action #54785: tap devices not in any zone, error reported by firewalld added

#18 Updated by okurz 3 months ago

that I can easily provide :)

openqa-aarch64:~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:18:85:04:00:e0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.112.3/24 brd 192.168.112.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::218:85ff:fe04:e0/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:18:85:05:00:e0 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:18:85:00:00:e0 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:18:85:01:00:e0 brd ff:ff:ff:ff:ff:ff
6: tap1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether b6:96:22:24:0f:08 brd ff:ff:ff:ff:ff:ff
7: tap2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 22:9f:f1:7b:99:18 brd ff:ff:ff:ff:ff:ff
8: tap3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether e6:37:77:92:54:3b brd ff:ff:ff:ff:ff:ff
9: tap4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 06:5e:1a:63:77:56 brd ff:ff:ff:ff:ff:ff
10: tap5: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 6a:cc:87:d9:04:4c brd ff:ff:ff:ff:ff:ff
11: ovs-system: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 7a:ae:cc:2c:17:6d brd ff:ff:ff:ff:ff:ff
12: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 46:0c:e3:a2:72:4b brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.2/15 brd 10.1.255.255 scope global br1
       valid_lft forever preferred_lft forever
    inet6 fe80::440c:e3ff:fea2:724b/64 scope link 
       valid_lft forever preferred_lft forever
13: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether a6:51:a0:51:cc:49 brd ff:ff:ff:ff:ff:ff
14: tap128: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether de:12:6a:b1:26:40 brd ff:ff:ff:ff:ff:ff
15: tap129: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether a6:64:5a:aa:45:48 brd ff:ff:ff:ff:ff:ff
16: tap130: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether c2:f9:56:fb:d8:ac brd ff:ff:ff:ff:ff:ff
17: tap131: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether c2:f9:bc:98:c7:4f brd ff:ff:ff:ff:ff:ff
18: tap132: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether b6:64:a2:35:30:84 brd ff:ff:ff:ff:ff:ff
19: tap133: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether e2:14:50:81:0b:c9 brd ff:ff:ff:ff:ff:ff
20: tap64: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 6e:8e:90:1a:e6:af brd ff:ff:ff:ff:ff:ff
21: tap65: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether ce:4a:7f:ba:e3:84 brd ff:ff:ff:ff:ff:ff
22: tap66: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 3a:29:ef:57:96:48 brd ff:ff:ff:ff:ff:ff
23: tap67: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 2a:7d:e1:7a:ef:64 brd ff:ff:ff:ff:ff:ff
24: tap68: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 06:24:ad:80:2a:4d brd ff:ff:ff:ff:ff:ff
25: tap69: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master ovs-system state DOWN group default qlen 1000
    link/ether 72:db:93:47:55:ee brd ff:ff:ff:ff:ff:ff
openqa-aarch64:~ # ip route
default via 192.168.112.254 dev eth0 
10.0.0.0/15 dev br1 proto kernel scope link src 10.0.2.2 
192.168.112.0/24 dev eth0 proto kernel scope link src 192.168.112.3

#19 Updated by okurz 3 months ago

the approach in #52499#note-15 did – maybe obviously – not work because we triggered only a single job but we would need the parallel one so let's try again:

build=20190728; openqa-client --host https://openqa.opensuse.org isos post _NO_OBSOLETE_BUILD=1 ARCH=aarch64 BUILD=okurz/os-autoinst-distri-opensuse#8044 DISTRI=opensuse FLAVOR=DVD ISO=openSUSE-Tumbleweed-DVD-aarch64-Snapshot$build-Media.iso MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo REPO_0=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_0_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo REPO_OSS=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build VERSION=Tumbleweed CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/aarch64_mm NEEDLES_DIR=/var/lib/openqa/share/tests/opensuse/products/opensuse/needles PRODUCTDIR=os-autoinst-distri-opensuse/products/opensuse TEST=wicked_basic_ref,wicked_basic_sut

->

{
  count => 3,
  failed => [],
  ids => [995247, 995248, 995249],
  scheduled_product_id => 114676,
}

https://openqa.opensuse.org/t995249

#20 Updated by okurz 3 months ago

failed because I used a "/" in BUILD, reported as #54809 .

build=20190728; openqa-client --host https://openqa.opensuse.org isos post _NO_OBSOLETE_BUILD=1 ARCH=aarch64 BUILD=okurz:os-autoinst-distri-opensuse#8044 DISTRI=opensuse FLAVOR=DVD ISO=openSUSE-Tumbleweed-DVD-aarch64-Snapshot$build-Media.iso MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo REPO_0=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_0_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo REPO_OSS=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build VERSION=Tumbleweed CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/aarch64_mm NEEDLES_DIR=/var/lib/openqa/share/tests/opensuse/products/opensuse/needles PRODUCTDIR=os-autoinst-distri-opensuse/products/opensuse TEST=wicked_basic_ref,wicked_basic_sut

->

{
  count => 3,
  failed => [],
  ids => [995839, 995840, 995841],
  scheduled_product_id => 114706,
}

https://openqa.opensuse.org/t995841

#21 Updated by okurz 3 months ago

So the creation job managed to create an HDD image but the downstream jobs fail to download it. Let's try to avoid the ':' as well:

build=20190728; openqa-client --host https://openqa.opensuse.org isos post _NO_OBSOLETE_BUILD=1 ARCH=aarch64 BUILD=okurz-os-autoinst-distri-opensuse#8044 DISTRI=opensuse FLAVOR=DVD ISO=openSUSE-Tumbleweed-DVD-aarch64-Snapshot$build-Media.iso MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo REPO_0=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_0_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo REPO_OSS=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build VERSION=Tumbleweed CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/aarch64_mm NEEDLES_DIR=/var/lib/openqa/share/tests/opensuse/products/opensuse/needles PRODUCTDIR=os-autoinst-distri-opensuse/products/opensuse TEST=wicked_basic_ref,wicked_basic_sut

->

{
  count => 3,
  failed => [],
  ids => [995847, 995848, 995849],
  scheduled_product_id => 114709,
}

https://openqa.opensuse.org/t995849

#22 Updated by okurz 3 months ago

<guillaume_g> okurz: failed again: https://openqa.opensuse.org/tests/995849#dependencies :(
<okurz> guillaume_g: seems that also the "#" is a problem, but not for the publishing, only the download in caching, see https://openqa.opensuse.org/tests/995900/file/autoinst-log.txt . interesting

Trying again without the #

build=20190728; openqa-client --host https://openqa.opensuse.org isos post _NO_OBSOLETE_BUILD=1 ARCH=aarch64 BUILD=okurz-os-autoinst-distri-opensuse-8044 DISTRI=opensuse FLAVOR=DVD ISO=openSUSE-Tumbleweed-DVD-aarch64-Snapshot$build-Media.iso MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo REPO_0=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_0_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo REPO_OSS=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build VERSION=Tumbleweed CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/aarch64_mm NEEDLES_DIR=/var/lib/openqa/share/tests/opensuse/products/opensuse/needles PRODUCTDIR=os-autoinst-distri-opensuse/products/opensuse TEST=wicked_basic_ref,wicked_basic_sut
{
  count => 3,
  failed => [],
  ids => [995901, 995902, 995903],
  scheduled_product_id => 114718,
}

https://openqa.opensuse.org/t995903

#23 Updated by okurz 3 months ago

wrong syntax in https://openqa.opensuse.org/tests/995903#step/before_test/46 , trying again with an explicit array casting:

build=20190728; openqa-client --host https://openqa.opensuse.org isos post _NO_OBSOLETE_BUILD=1 ARCH=aarch64 BUILD=okurz-os-autoinst-distri-opensuse-8044 DISTRI=opensuse FLAVOR=DVD ISO=openSUSE-Tumbleweed-DVD-aarch64-Snapshot$build-Media.iso MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo REPO_0=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_0_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo REPO_OSS=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build VERSION=Tumbleweed CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse.git#fix/aarch64_mm NEEDLES_DIR=/var/lib/openqa/share/tests/opensuse/products/opensuse/needles PRODUCTDIR=os-autoinst-distri-opensuse/products/opensuse TEST=wicked_basic_ref,wicked_basic_sut

->

{
  count => 3,
  failed => [],
  ids => [995936, 995937, 995938],
  scheduled_product_id => 114728,
}

https://openqa.opensuse.org/t995938

#24 Updated by okurz 3 months ago

so the latest shows the ping not working for aarch64 which is expected, let's test on x86_64 still though.

Created new script for easier triggering: https://raw.githubusercontent.com/okurz/scripts/master/openqa-trigger-mm

env build=20190728 ~/bin/openqa-trigger-mm

->

{
  count => 3,
  failed => [],
  ids => [995963, 995964, 995965],
  scheduled_product_id => 114736,
}

-> https://openqa.opensuse.org/t995965

#25 Updated by ggardet_arm 3 months ago

Contrary to the current test https://openqa.opensuse.org/tests/994576 where enp0s3 is up, in your aarch64 test, enp0s3 is not ready.

#26 Updated by okurz 3 months ago

Yes, I saw that as well. Interesting. But currently I am giving up because of the immature tooling. I tried to spawn an x86_64 set of tests and that again needs properly initialized NEEDLESDIR. Also needing to skip the creating job is annoying. https://github.com/os-autoinst/openQA/pull/2224 might be of help there. For now I give up going further.

#27 Updated by ggardet_arm 3 months ago

With https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8067 we have some results in https://openqa.opensuse.org/tests/995987/file/serial_terminal.txt

And ping -c 1 8.8.8.8 fails as well as downloading from google with curl -L 216.58.204.99

Routing inside the guest seems to be ok. So, the problem is the routing on the host.
Maybe we should add eth0 to the bridge, or configure br1 to route to eth0 when needed.

#28 Updated by okurz 3 months ago

  • Status changed from Blocked to Feedback

see #54281#note-3

https://openqa.opensuse.org/tests/999020# is soft-failed so I can move the scenarios to the product validation job group, which I did. Let's monitor for next jobs – and after worker restart.

Seems from history of root@aarch64.o.o that we never did firewall-cmd --permanent --zone=trusted --add-masquerade. I guess the documentation is a bit misleading to state "To enable masquerading one can use the following command:
firewall-cmd --permanent --zone=external --add-masquerade" because it needs to be the zone which has the external interface, which is "br1" in "trusted" in our case.

if the entries for OVS_BRIDGE_PORT_DEVICE_X start with "0" or "1" should not matter according to https://www.suse.com/documentation/sles-15/book_sle_admin/data/sec_network_openvswitch.html#sec_network_openvswitch_bridge as long as they are unique.

There are still warnings about tap devices not in any zone though.

To get rid of the warning about eth0 not being in any zone I called firewall-cmd --zone=external --add-interface=eth0. This wasn't necessary to fix the test though. https://openqa.opensuse.org/tests/999046# showed still working but the livehandler could not connect. Fixed that by moving to "trusted" zone: firewall-cmd --zone=trusted --change-interface=eth0 so we know that we need that for both liveview+MM now :)

On w1 I am now confused: yast2 firewall states that eth0 is in trusted, firewall-cmd --get-zone-of-interface eth0 states "no zone", which the warning confirms and /etc/sysconfig/network/ifcfg-eth0 states "ZONE=public". I wonder if the zone in /etc/sysconfig/network/ifcfg-eth0 actually has any effect or if this only used by SuSEfirewall2 but not by firewalld.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s1-networkscripts-interfaces doesn't mention the ZONE config. Maybe I can just delete it from the file when this is old, e.g. for SuSEfirewall2 and not necessary anymore? I deleted the line ZONE=trusted from /etc/sysconfig/network/ifcfg-eth0 and rebooted the machine aarch64.o.o now. Let's see. Machine came up, ZONE=trusted is still not there in the file, firewall-cmd reports eth0 to be part of "trusted" and no warning in /var/log/firewalld. Retriggered https://openqa.opensuse.org/tests/999063 , still fine. live mode and test working. Deleted ZONE=… from all files with sed -i -e '/ZONE=/d' /etc/sysconfig/network/* on aarch64.o.o . Rebooting and trying again. https://openqa.opensuse.org/tests/999067 is also fine. Also added "ovs-system" to trusted zone to fix warning in /var/log/firewalld.

#29 Updated by okurz 2 months ago

  • Status changed from Feedback to Resolved

https://openqa.opensuse.org/tests?match=wicked_basic are very stable now and for both x86_64 and aarch64 the according scenarios are scheduled for the corresponding product validation job group.

Also available in: Atom PDF