action #15416

[tools] bridge device seems to have disappeared for HA tests

Added by okurz over 3 years ago. Updated about 2 years ago.

Status:ResolvedStart date:16/09/2016
Priority:NormalDue date:
Assignee:hsehic% Done:

100%

Category:Infrastructure
Target version:-
Difficulty:
Duration:

Description

observation

https://openqa.suse.de/tests/662444/file/autoinst-log.txt says 07:33:38.2208 23535 QEMU: bridge br0 does not exist! and parallel jobs fail therefore, e.g. https://openqa.suse.de/tests/662442

problem

H1. someone removed the bridge configuration
H2. qemu does not use/start the bridge anymore

further details

latest


Subtasks

action #13748: [ha] ocfs2 test spills out a lot of errors regardless of ...Resolvedldevulder


Related issues

Related to openQA Tests - action #18180: [ha][tools]change HA tests to use openvswitch configuration Resolved 30/03/2017
Duplicated by openQA Tests - action #15472: Unreachable network in HA hacluster Rejected 13/12/2016
Blocked by openQA Tests - action #14334: job incomplete: "could not configure /dev/net/tun (tap00)... Resolved 20/10/2016

History

#1 Updated by okurz over 3 years ago

  • Target version set to Milestone 4

#2 Updated by maritawerner over 3 years ago

  • Assignee set to okurz

#3 Updated by okurz over 3 years ago

  • Related to action #14334: job incomplete: "could not configure /dev/net/tun (tap00): Device or resource busy" added

#4 Updated by okurz over 3 years ago

  • Assignee deleted (okurz)

looking at https://openqa.suse.de/tests?hoursfresh=24&match=hacluster-supportserver one can find many incompletes but also some passed. The interesting part is that sometimes the hacluster-supportserver test is "passed" even though the same symptom shows up and the parallel jobs still fail, e.g. https://openqa.suse.de/tests/672601/file/autoinst-log.txt. The last one working seems to be https://openqa.suse.de/tests/657200/file/autoinst-log.txt Comparing the logfiles does not show any significant difference, e.g. the qemu parameters are same for the network devices.

Who has more information regarding bridge devices?

#5 Updated by nadvornik over 3 years ago

As I wrote in #14334, the TAP devices tap00,tap01,tap02 and probably also the bridges are special infrastructure created only for this test.

#6 Updated by okurz over 3 years ago

  • Related to deleted (action #14334: job incomplete: "could not configure /dev/net/tun (tap00): Device or resource busy")

#7 Updated by okurz over 3 years ago

  • Blocked by action #14334: job incomplete: "could not configure /dev/net/tun (tap00): Device or resource busy" added

#8 Updated by hsehic over 3 years ago

okurz wrote:

observation


https://openqa.suse.de/tests/662444/file/autoinst-log.txt says 07:33:38.2208 23535 QEMU: bridge br0 does not exist! and parallel jobs fail therefore, e.g. https://openqa.suse.de/tests/662442


problem


H1. someone removed the bridge configuration

As mentioned/discussed last week and as it seems, openqa worker network/bridge setup has been modified. All following issues are based on that fact. OpenQA Admin(s) should fix the infra/network setup and the show goes on...

H2. qemu does not use/start the bridge anymore

not that we can confirm.

further details


latest

#9 Updated by okurz over 3 years ago

hsehic wrote:

H1. someone removed the bridge configuration

As mentioned/discussed last week and as it seems, openqa worker network/bridge setup has been modified.

do you or anybody else knows by whom and how?

All following issues are based on that fact. OpenQA Admin(s) should fix the infra/network setup and the show goes on...

how to fix?

#10 Updated by asmorodskyi about 3 years ago

there is physically no br0 on openqaworker3 ( only worker which runs HA test ) so possible solutions will be :
- reconfigure HA tests to use br1 ( also need to add salt recipe for /etc/qemu-ifup-br1 on openqaworker3 because it not exists there )
- switch whole HA Job Group to use openvswitch ( preferred way )

#11 Updated by okurz about 3 years ago

  • Duplicated by action #15472: Unreachable network in HA hacluster added

#12 Updated by okurz about 3 years ago

  • Assignee set to hsehic
  • Target version changed from Milestone 4 to Milestone 5

As discussed with hsehic, one more try to fix it within the next milestone period. I am keeping this as urgent as it fails mostly and even if the supportserver test does not fail the children may still do so. Expected to be fixed within the next weeks. If still not fixed until M5 we should seriously ask ourselves if it makes sense to run these tests at all.

#13 Updated by okurz about 3 years ago

any update on this?

#14 Updated by hsehic about 3 years ago

  • Status changed from New to In Progress

#16 Updated by okurz about 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: hacluster-alpha-node1
http://openqa.suse.de/tests/790620

#17 Updated by okurz about 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: hacluster-alpha-node1
http://openqa.suse.de/tests/790620

#18 Updated by okurz about 3 years ago

  • Target version changed from Milestone 5 to Milestone 6

#19 Updated by RBrownSUSE about 3 years ago

  • Subject changed from bridge device seems to have disappeared for HA tests to [tools] bridge device seems to have disappeared for HA tests

#21 Updated by asmorodskyi about 3 years ago

Due to constant failing jobs was moved to development https://openqa.suse.de/admin/job_templates/79

#22 Updated by okurz about 3 years ago

  • Status changed from In Progress to New
  • Target version deleted (Milestone 6)

as far I understood hsehic seems himself also as blocked here. I removed the target version because it probably will not be done within M6. I can't change the priority for unknown reasons though

#23 Updated by RBrownSUSE about 3 years ago

  • Status changed from New to In Progress
  • Assignee changed from hsehic to RBrownSUSE

#24 Updated by RBrownSUSE about 3 years ago

  • Status changed from In Progress to Resolved

bridge device created based on denis' documentation

This should be salted in the future - ticket https://progress.opensuse.org/issues/18178

Long term solution is to have HA tests using the traditional openQA openvswitch configuration - ticket https://progress.opensuse.org/issues/18180

Also found an issue in the HA tests where barrier-clashes occured when tests started too quickly. Tests to be modified. Richard volunteered - https://progress.opensuse.org/issues/18184

#25 Updated by okurz almost 3 years ago

  • Status changed from Resolved to In Progress
  • Assignee changed from RBrownSUSE to hsehic

As discussed with hsehic reopened to note down what we changed.

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/2732 merged to exclude the non-working HA specific parts from jobs right now.

Next step would be to add another test suite which sets 'HA_CLUSTER_TEST_ADVANCED' for tests.

Also, some changes to os-autoinst proposed to fix too early access to barriers (upcoming os-autoinst change).

#26 Updated by RBrownSUSE almost 3 years ago

  • Status changed from In Progress to Resolved

PR is merged, and even if it wasn't, this ticket is resolved, I recommend new tickets that are not tagged with the [tools] backlog for any other outstanding HA specific issues.

#27 Updated by okurz almost 3 years ago

  • Related to action #18180: [ha][tools]change HA tests to use openvswitch configuration added

Also available in: Atom PDF