action #52499

[aarch64] Proper multi-machine test setup and wicked_basic successfully tested (was: wicked tests always in schedule state - tap worker required)

Added by ggardet_arm about 1 month ago. Updated 25 days ago.

Status:BlockedStart date:03/06/2019
Priority:NormalDue date:
Assignee:okurz% Done:

0%

Category:Infrastructure
Target version:-
Difficulty:
Duration:

Description

wicked tests: https://openqa.opensuse.org/tests/946871#settings never reach running state.
I think this is because this test now requires a TAP worker.
So, aarch64 worker needs to be updated to handle TAP properly and worker class also need to be updated.

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #51635: [network] test fails in t08_setup_second_card Workable 20/05/2019
Related to openQA Tests - action #54281: [aarch64] test fails in wicked before_test - DNS problem New 15/07/2019
Blocked by openQA Tests - action #52559: [network] test fails in t01_basic to ping the other node Feedback 04/06/2019

History

#1 Updated by okurz about 1 month ago

  • Category set to Infrastructure
  • Assignee set to asmorodskyi

@asmorodskyi, seems like you changed the test but the worker does not have "tap", is it?

#2 Updated by okurz about 1 month ago

  • Status changed from New to Feedback
  • Assignee changed from asmorodskyi to okurz
[03/06/2019 13:57:32] <okurz> riafarov: did *you* add `WORKER_CLASS=qemu_x86_64,tap` to wicked_basic_sut+wicked_basic_ref on o3? this is what the auditlog tells me however that does not sound like it would make any sense when we want to run it on aarch64 for example :)
[03/06/2019 13:58:41] <riafarov> okurz: yes I did that after talking to Anton
[03/06/2019 13:58:55] <riafarov> okurz: it's MM, so it's wrong to run it on aarch64
[03/06/2019 13:59:02] <okurz> riafarov: why?
[03/06/2019 13:59:08] <riafarov> okurz: and we have no idea how it managed to work there
[03/06/2019 13:59:37] <riafarov> okurz: do we have MM setup for arm?
[03/06/2019 14:01:28] <riafarov> okurz: for me it sounds like we should unschedule it for aarch64 then
[03/06/2019 14:01:33] <okurz> riafarov: Depends on what exactly qualifies as "MM setup" :) asmorodskyi and me also talked today and we agreed that probably "wicked_basic" relies on "basic multimachine" only – whatever that means but it works ;) So I will change it back and check that it properly works. Depending on when we add something like "wicked_advanced" we might see 
[03/06/2019 14:01:33] <okurz> what's missing
[03/06/2019 14:02:40] <riafarov> okurz: it incompletes every time when executed on the wrong worker
[03/06/2019 14:02:58] <riafarov> okurz: RMs di great job to retrigger it 5 times before it happens
[03/06/2019 14:03:05] <okurz> riafarov: ok, I will check that. 6 days ago it was fine though: https://openqa.opensuse.org/tests/944262#
[03/06/2019 14:03:23] <riafarov> okurz:  https://openqa.suse.de/tests/2943106/#step/boot_to_desktop/2
[03/06/2019 14:03:42] <riafarov> okurz: again, no idea how it manages to work
[03/06/2019 14:04:09] <riafarov> okurz: I've changed test suite setting to what is reasonable. If Anton is fine with your changes, feel free to revert
[03/06/2019 14:04:37] <okurz> riafarov: the example you mentioned was running covering openqaworker6+8, maybe an issue with the GRE tunnel which would allow "distributed multi-machine". When we are staying on the same host we should be fine
[03/06/2019 14:06:15] <riafarov> okurz: do not exclude 64bit runs from the equation
[03/06/2019 14:06:31] <riafarov> okurz: openqaworker4 doesn't support the scenario either
[03/06/2019 14:07:14] <riafarov> okurz: or openqaworker1 (do not rememeber which has tap device)
[03/06/2019 14:07:31] <riafarov> okurz: so in case you want to revert, also remove NICTYPE setting from the test suite
[03/06/2019 14:07:50] <riafarov> okurz: https://openqa.opensuse.org/tests/938212# here is failure on 64bit
[03/06/2019 14:08:38] <okurz> riafarov: yes, I get it know. ok, I will remove NICTYPE

So I set WORKER_CLASS=tap and removed NICTYPE=tap from both test suites "wicked_basic_sut" and "wicked_basic_ref".

Triggered for testing:

build=20190601; openqa-client --host https://openqa.opensuse.org isos post _NO_OBSOLETE_BUILD=1 ARCH=aarch64 BUILD=$build DISTRI=opensuse FLAVOR=DVD ISO=openSUSE-Tumbleweed-DVD-aarch64-Snapshot$build-Media.iso MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo REPO_0=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_0_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo REPO_OSS=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-aarch64-Snapshot$build-debuginfo SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-aarch64-Snapshot$build VERSION=Tumbleweed TEST=wicked_basic_ref,wicked_basic_sut

->

{
  count => 3,
  failed => [],
  ids => [947700, 947701, 947702],
  scheduled_product_id => 109314,
}

so waiting for https://openqa.opensuse.org/tests/947702

#3 Updated by okurz about 1 month ago

  • Subject changed from [aarch64] wicked tests always in schedule state - tap worker required to [aarch64] Proper multi-machine test setup and wicked_basic successfully tested (was: wicked tests always in schedule state - tap worker required)
  • Assignee changed from okurz to asmorodskyi

SUT failed to reach the parallel node, probably schrödinbug. We (asmorodskyi, riafarov, me) do not know why it could have ever worked as the aarch64 host does not have openvswitch configured which we probably need. asmorodskyi does not plan to support wicked_* in the near future. I have removed the scenarios from the aarch64 Tumbleweed as well as aarch64 Leap 15 for now.

However wicked_basic on x86_64 fails with what looks like the same problem: https://openqa.opensuse.org/tests/948247#step/t01_basic/420 so can you please look into that?

#4 Updated by ggardet_arm about 1 month ago

Why do you remove it from aarch64? Especially when x86_64 fails in the same way.
Wicked is an important part the testing of aarch64, so we should keep it.
We may need to configure aarch64 worker properly to support it.

#5 Updated by okurz about 1 month ago

ggardet_arm wrote:

We may need to configure aarch64 worker properly to support it.

Yes, this is what the subject line states.

#6 Updated by okurz about 1 month ago

  • Blocked by action #52559: [network] test fails in t01_basic to ping the other node added

#7 Updated by asmorodskyi about 1 month ago

while working on this issue keep in mind #51635 , "proper" setup for wicked tests ( even basic one ) actually means not just tap0,tap1,tap2,tap3 but also tapX+64 ( e.g. tap64,tap65,tap66,tap67 ) because we using TWO interfaces in all extended tests and one test case in wicked basic ( t08 )

#8 Updated by asmorodskyi about 1 month ago

  • Status changed from Feedback to Workable

this ticket at the moment is about pure OPS task - setup MM for openQA in aarch64 worker . From tests perspective everything working as expected so I am sign off from this issue for now . It would wait for volunteer who will dare setup MM on o3 aarch64 workers

#9 Updated by asmorodskyi about 1 month ago

  • Assignee deleted (asmorodskyi)

#10 Updated by okurz about 1 month ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Priority changed from Urgent to Low

I guess we should live with the fact that the wicked tests where never workable on aarch64 as the setup is incomplete so I am arguing that the issue should not be "Urgent". Taking it, reducing prio and waiting for blocker.

#11 Updated by okurz 28 days ago

  • Status changed from Blocked to In Progress
  • Priority changed from Low to Normal

ggardet asked me again if we can not do it sooner, fine :)

So I debugged with asmorodskyi what is the problem on openqaworker1 and we found some mis-configuration in the firewall. The default-zone was set to "trusted" however masquerading and the bridge were not in "trusted". Fixed that and adjusted the documentation.

I followed http://open.qa/docs/#_tap_based_network to install necessary steps on aarch64.o.o
One problem that was probably caused by this is that the live mode could not connect anymore. By temporarily disabling the firewall I could identify this as the culprit.
What seems to have happened is that we explicitly added br1 to the zone "external" whereas on openqaworker1 it is "default" which seems to be more permissive and allow the live handler connections.

to support multi-nic tests:

for i in {64..69}; do ln -s /etc/sysconfig/network/ifcfg-tap{0,$i} ; done
for i in {128..133}; do ln -s /etc/sysconfig/network/ifcfg-tap{0,$i} ; done

I wonder if it would need to be real files and not symlinks?

#12 Updated by okurz 25 days ago

  • Description updated (diff)
  • Status changed from In Progress to Blocked

waiting for asmorodskyi to do the debugging in #52559 and #51635

#13 Updated by okurz 25 days ago

  • Related to action #51635: [network] test fails in t08_setup_second_card added

#14 Updated by ggardet_arm about 16 hours ago

  • Related to action #54281: [aarch64] test fails in wicked before_test - DNS problem added

Also available in: Atom PDF