action #15416
closed[tools] bridge device seems to have disappeared for HA tests
100%
Description
observation¶
https://openqa.suse.de/tests/662444/file/autoinst-log.txt says 07:33:38.2208 23535 QEMU: bridge br0 does not exist!
and parallel jobs fail therefore, e.g. https://openqa.suse.de/tests/662442
problem¶
H1. someone removed the bridge configuration
H2. qemu does not use/start the bridge anymore
further details¶
Updated by okurz almost 8 years ago
- Related to action #14334: job incomplete: "could not configure /dev/net/tun (tap00): Device or resource busy" added
Updated by okurz almost 8 years ago
- Assignee deleted (
okurz)
looking at https://openqa.suse.de/tests?hoursfresh=24&match=hacluster-supportserver one can find many incompletes but also some passed. The interesting part is that sometimes the hacluster-supportserver test is "passed" even though the same symptom shows up and the parallel jobs still fail, e.g. https://openqa.suse.de/tests/672601/file/autoinst-log.txt. The last one working seems to be https://openqa.suse.de/tests/657200/file/autoinst-log.txt Comparing the logfiles does not show any significant difference, e.g. the qemu parameters are same for the network devices.
Who has more information regarding bridge devices?
Updated by nadvornik almost 8 years ago
As I wrote in #14334, the TAP devices tap00,tap01,tap02 and probably also the bridges are special infrastructure created only for this test.
Updated by okurz almost 8 years ago
- Related to deleted (action #14334: job incomplete: "could not configure /dev/net/tun (tap00): Device or resource busy")
Updated by okurz almost 8 years ago
- Blocked by action #14334: job incomplete: "could not configure /dev/net/tun (tap00): Device or resource busy" added
Updated by hsehic almost 8 years ago
okurz wrote:
observation¶
https://openqa.suse.de/tests/662444/file/autoinst-log.txt says
07:33:38.2208 23535 QEMU: bridge br0 does not exist!
and parallel jobs fail therefore, e.g. https://openqa.suse.de/tests/662442problem¶
H1. someone removed the bridge configuration
As mentioned/discussed last week and as it seems, openqa worker network/bridge setup has been modified. All following issues are based on that fact. OpenQA Admin(s) should fix the infra/network setup and the show goes on...
H2. qemu does not use/start the bridge anymore
not that we can confirm.
further details¶
Updated by okurz almost 8 years ago
hsehic wrote:
H1. someone removed the bridge configuration
As mentioned/discussed last week and as it seems, openqa worker network/bridge setup has been modified.
do you or anybody else knows by whom and how?
All following issues are based on that fact. OpenQA Admin(s) should fix the infra/network setup and the show goes on...
how to fix?
Updated by asmorodskyi almost 8 years ago
there is physically no br0 on openqaworker3 ( only worker which runs HA test ) so possible solutions will be :
- reconfigure HA tests to use br1 ( also need to add salt recipe for /etc/qemu-ifup-br1 on openqaworker3 because it not exists there )
- switch whole HA Job Group to use openvswitch ( preferred way )
Updated by okurz almost 8 years ago
- Has duplicate action #15472: Unreachable network in HA hacluster added
Updated by okurz almost 8 years ago
- Assignee set to hsehic
- Target version changed from Milestone 4 to Milestone 5
As discussed with hsehic, one more try to fix it within the next milestone period. I am keeping this as urgent as it fails mostly and even if the supportserver test does not fail the children may still do so. Expected to be fixed within the next weeks. If still not fixed until M5 we should seriously ask ourselves if it makes sense to run these tests at all.
Updated by okurz over 7 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: hacluster-alpha-node1
http://openqa.suse.de/tests/790620
Updated by okurz over 7 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: hacluster-alpha-node1
http://openqa.suse.de/tests/790620
Updated by okurz over 7 years ago
- Target version changed from Milestone 5 to Milestone 6
Updated by RBrownSUSE over 7 years ago
- Subject changed from bridge device seems to have disappeared for HA tests to [tools] bridge device seems to have disappeared for HA tests
Updated by dzedro over 7 years ago
Latest fail https://openqa.suse.de/tests/815762
Updated by asmorodskyi over 7 years ago
Due to constant failing jobs was moved to development https://openqa.suse.de/admin/job_templates/79
Updated by okurz over 7 years ago
- Status changed from In Progress to New
- Target version deleted (
Milestone 6)
as far I understood hsehic seems himself also as blocked here. I removed the target version because it probably will not be done within M6. I can't change the priority for unknown reasons though
Updated by RBrownSUSE over 7 years ago
- Status changed from New to In Progress
- Assignee changed from hsehic to RBrownSUSE
Updated by RBrownSUSE over 7 years ago
- Status changed from In Progress to Resolved
bridge device created based on denis' documentation
This should be salted in the future - ticket https://progress.opensuse.org/issues/18178
Long term solution is to have HA tests using the traditional openQA openvswitch configuration - ticket https://progress.opensuse.org/issues/18180
Also found an issue in the HA tests where barrier-clashes occured when tests started too quickly. Tests to be modified. Richard volunteered - https://progress.opensuse.org/issues/18184
Updated by okurz over 7 years ago
- Status changed from Resolved to In Progress
- Assignee changed from RBrownSUSE to hsehic
As discussed with hsehic reopened to note down what we changed.
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/2732 merged to exclude the non-working HA specific parts from jobs right now.
Next step would be to add another test suite which sets 'HA_CLUSTER_TEST_ADVANCED' for tests.
Also, some changes to os-autoinst proposed to fix too early access to barriers (upcoming os-autoinst change).
Updated by RBrownSUSE over 7 years ago
- Status changed from In Progress to Resolved
PR is merged, and even if it wasn't, this ticket is resolved, I recommend new tickets that are not tagged with the [tools] backlog for any other outstanding HA specific issues.
Updated by okurz over 7 years ago
- Related to action #18180: [ha][tools]change HA tests to use openvswitch configuration added