action #52559

[network] test fails in t01_basic to ping the other node

Added by okurz 5 months ago. Updated 3 months ago.

Status:ResolvedStart date:04/06/2019
Priority:HighDue date:
Assignee:okurz% Done:

0%

Category:Bugs in existing tests
Target version:-
Difficulty:
Duration:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-wicked_basic_sut@64bit fails in
t01_basic

Test suite description

Include basic sanity checks of wicked network framework
Maintainer: asmorodskyi@suse.de

Reproducible

Fails since (at least) Build 20190529
but probably a recent regression as stated by asmorodskyi in #opensuse-factory chat

Expected result

Last good: 20190527 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Blocks openQA Tests - action #52499: [aarch64] Proper multi-machine test setup and wicked_basi... Resolved 03/06/2019
Blocks openQA Tests - action #51635: [network] test fails in t08_setup_second_card Resolved 20/05/2019

History

#1 Updated by okurz 5 months ago

  • Status changed from New to Workable
  • Assignee set to asmorodskyi

@asmorodskyi you wanted to look into this, right?

#2 Updated by okurz 5 months ago

  • Blocks action #52499: [aarch64] Proper multi-machine test setup and wicked_basic successfully tested (was: wicked tests always in schedule state - tap worker required) added

#3 Updated by asmorodskyi 4 months ago

  • Assignee deleted (asmorodskyi)

#4 Updated by okurz 4 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

fine, I will try again. I did echo 'zypper -n in libcap-progs && for i in /usr/bin/qemu-system-*; do setcap CAP_NET_ADMIN=ep $i ; done' | transactional-update shell && reboot on openqaworker1 and retriggered wicked_basic, see https://openqa.opensuse.org/tests/958541# , do we expect that to work now? what next?

#5 Updated by okurz 4 months ago

moved most recent jobs into the dev group to not block any TW release:

for i in 953624 953742 958540 958541; do openqa-client --host https://openqa.opensuse.org jobs/$i put --json-data '{"group_id": 38}'; done

->

{ job_id => 953624 }
{ job_id => 953742 }
{ job_id => 958540 }
{ job_id => 958541 }

https://openqa.opensuse.org/tests/958701 retriggered.

Moved both "wicked_basic_ref" and "wicked_basic_sut" from the validation job group Tumbleweed to Development Tumbleweed. Also added for aarch64 in dev group.

#6 Updated by okurz 4 months ago

https://openqa.opensuse.org/tests/958701#step/before_test/80 fails what looks like the same. @asmorodskyi triggered some jobs with isos post to try out or debug stuff and then hit problems that the triggered jobs lost their relation to the parent and fail to load the hdd image even though it's referenced in the settings. He wants to continue looking into the issue later.

#7 Updated by okurz 4 months ago

  • Priority changed from Urgent to High

We looked into this together and the problem is in the former test relying on the qemu user network and also incorrect DNS information

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/7710 created to address this, seemingly fixes the problem on x86_64, see https://openqa.opensuse.org/tests/962324 , but not yet on aarch64.

#8 Updated by asmorodskyi 4 months ago

IMO problem described in this ticket is already solved . for x86_64 there is no issues at all - all is working . for aarch64 problem is different than this ticket stating and also for aarch64 we have another ticket ( linked to this one ) so no point to keep two track same issue

#9 Updated by okurz 4 months ago

Yes, agreed. I will close this ticket as soon as we have the original scenario validated in the product validation job group of Tumbleweed.

#10 Updated by okurz 4 months ago

  • Assignee changed from okurz to asmorodskyi

@asmorodskyi also the scenario is still in the development job group so not solved.

Now we have a failure in https://openqa.opensuse.org/tests/963181#step/t08_setup_second_card/175 .

In https://openqa.opensuse.org/tests/963181/file/serial_terminal.txt I can see a lot of errors, all which seem to not stop the test execution.

could you take a look please?

#11 Updated by okurz 4 months ago

  • Blocks action #51635: [network] test fails in t08_setup_second_card added

#12 Updated by asmorodskyi 4 months ago

  • Assignee deleted (asmorodskyi)

ticket #51635 is covering t08 exclusively . don't want to track two tickets for same issue

#13 Updated by okurz 4 months ago

  • Status changed from In Progress to Feedback
  • Assignee set to okurz

@asmorodskyi please reference other tickets by #<id> and not the full URL to make use of the title and status preview. Also, "In Progress" without an assignee does not make much sense. So whatever ticket you prefer is fine for me. I will set this one to "Feedback" then and wait for your results in the other ticket to declare the scenario as stable before we can move it to group 1.

#14 Updated by ggardet_arm 3 months ago

It seems to be fixed, or am I missing something?

#15 Updated by okurz 3 months ago

ggardet_arm wrote:

It seems to be fixed, or am I missing something?

well, I guess you have read the comment just above your question? Yes, the original problem is fixed however we moved the test scenario into the "development" job group until it is stable again and currently the test scenario reproducibly fails on #51635 . So my suggested chain of work (serialized) is: Fix #51635 -> prove that the whole scenario is stable (green/soft-fail) -> move from "dev" group to product validation group -> prove that the scenario @ aarch64 is stable as well -> move to aarch64 product validation job group. My ETA: 4 weeks - 4 months, depending on when asmorodskyi plans to investigate the detailed problem and/or enjoys some summer break holiday ;) To speed up you can of course adjust the scenario to skip the test module "t08_setup_second_card" and only execute the others, e.g.: Add EXCLUDE_MODULES=t08_setup_second_card to the test suite along with a comment in the test suite description pointing to #51635 and add another testsuite that does not exclude the failing module and add that to the dev group and reference that in #51635

#16 Updated by okurz 3 months ago

  • Status changed from Feedback to Blocked

let's make it more clear that currently there is no work within this ticket by setting to "Blocked". Blocked by #51635

#17 Updated by okurz 3 months ago

  • Status changed from Blocked to Resolved

resolved as per #51635#note-9, jobs are in product validation job group of openSUSE Tumbleweed again.

Also available in: Atom PDF