action #52559

[network] test fails in t01_basic to ping the other node

Added by okurz about 1 month ago. Updated 5 days ago.

Status:FeedbackStart date:04/06/2019
Priority:HighDue date:
Assignee:okurz% Done:

0%

Category:Bugs in existing tests
Target version:-
Difficulty:
Duration:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-wicked_basic_sut@64bit fails in
t01_basic

Test suite description

Include basic sanity checks of wicked network framework
Maintainer: asmorodskyi@suse.de

Reproducible

Fails since (at least) Build 20190529
but probably a recent regression as stated by asmorodskyi in #opensuse-factory chat

Expected result

Last good: 20190527 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Blocks openQA Tests - action #52499: [aarch64] Proper multi-machine test setup and wicked_basi... Blocked 03/06/2019
Blocks openQA Tests - action #51635: [network] test fails in t08_setup_second_card Workable 20/05/2019

History

#1 Updated by okurz about 1 month ago

  • Status changed from New to Workable
  • Assignee set to asmorodskyi

@asmorodskyi you wanted to look into this, right?

#2 Updated by okurz about 1 month ago

  • Blocks action #52499: [aarch64] Proper multi-machine test setup and wicked_basic successfully tested (was: wicked tests always in schedule state - tap worker required) added

#3 Updated by asmorodskyi about 1 month ago

  • Assignee deleted (asmorodskyi)

#4 Updated by okurz about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

fine, I will try again. I did echo 'zypper -n in libcap-progs && for i in /usr/bin/qemu-system-*; do setcap CAP_NET_ADMIN=ep $i ; done' | transactional-update shell && reboot on openqaworker1 and retriggered wicked_basic, see https://openqa.opensuse.org/tests/958541# , do we expect that to work now? what next?

#5 Updated by okurz about 1 month ago

moved most recent jobs into the dev group to not block any TW release:

for i in 953624 953742 958540 958541; do openqa-client --host https://openqa.opensuse.org jobs/$i put --json-data '{"group_id": 38}'; done

->

{ job_id => 953624 }
{ job_id => 953742 }
{ job_id => 958540 }
{ job_id => 958541 }

https://openqa.opensuse.org/tests/958701 retriggered.

Moved both "wicked_basic_ref" and "wicked_basic_sut" from the validation job group Tumbleweed to Development Tumbleweed. Also added for aarch64 in dev group.

#6 Updated by okurz about 1 month ago

https://openqa.opensuse.org/tests/958701#step/before_test/80 fails what looks like the same. @asmorodskyi triggered some jobs with isos post to try out or debug stuff and then hit problems that the triggered jobs lost their relation to the parent and fail to load the hdd image even though it's referenced in the settings. He wants to continue looking into the issue later.

#7 Updated by okurz 27 days ago

  • Priority changed from Urgent to High

We looked into this together and the problem is in the former test relying on the qemu user network and also incorrect DNS information

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7710 created to address this, seemingly fixes the problem on x86_64, see https://openqa.opensuse.org/tests/962324 , but not yet on aarch64.

#8 Updated by asmorodskyi 27 days ago

IMO problem described in this ticket is already solved . for x86_64 there is no issues at all - all is working . for aarch64 problem is different than this ticket stating and also for aarch64 we have another ticket ( linked to this one ) so no point to keep two track same issue

#9 Updated by okurz 27 days ago

Yes, agreed. I will close this ticket as soon as we have the original scenario validated in the product validation job group of Tumbleweed.

#10 Updated by okurz 26 days ago

  • Assignee changed from okurz to asmorodskyi

@asmorodskyi also the scenario is still in the development job group so not solved.

Now we have a failure in https://openqa.opensuse.org/tests/963181#step/t08_setup_second_card/175 .

In https://openqa.opensuse.org/tests/963181/file/serial_terminal.txt I can see a lot of errors, all which seem to not stop the test execution.

could you take a look please?

#11 Updated by okurz 25 days ago

  • Blocks action #51635: [network] test fails in t08_setup_second_card added

#12 Updated by asmorodskyi 25 days ago

  • Assignee deleted (asmorodskyi)

ticket #51635 is covering t08 exclusively . don't want to track two tickets for same issue

#13 Updated by okurz 25 days ago

  • Status changed from In Progress to Feedback
  • Assignee set to okurz

@asmorodskyi please reference other tickets by #<id> and not the full URL to make use of the title and status preview. Also, "In Progress" without an assignee does not make much sense. So whatever ticket you prefer is fine for me. I will set this one to "Feedback" then and wait for your results in the other ticket to declare the scenario as stable before we can move it to group 1.

#14 Updated by ggardet_arm 5 days ago

It seems to be fixed, or am I missing something?

#15 Updated by okurz 5 days ago

ggardet_arm wrote:

It seems to be fixed, or am I missing something?

well, I guess you have read the comment just above your question? Yes, the original problem is fixed however we moved the test scenario into the "development" job group until it is stable again and currently the test scenario reproducibly fails on #51635 . So my suggested chain of work (serialized) is: Fix #51635 -> prove that the whole scenario is stable (green/soft-fail) -> move from "dev" group to product validation group -> prove that the scenario @ aarch64 is stable as well -> move to aarch64 product validation job group. My ETA: 4 weeks - 4 months, depending on when asmorodskyi plans to investigate the detailed problem and/or enjoys some summer break holiday ;) To speed up you can of course adjust the scenario to skip the test module "t08_setup_second_card" and only execute the others, e.g.: Add EXCLUDE_MODULES=t08_setup_second_card to the test suite along with a comment in the test suite description pointing to #51635 and add another testsuite that does not exclude the failing module and add that to the dev group and reference that in #51635

Also available in: Atom PDF