action #19044
closed[ha][scc]test fails in drbd
100%
Description
drbd and crm_mon tests fail and there are no successful runs in test history.
Please fix or move to development group.
Observation¶
openQA test in scenario sle-12-SP3-Server-DVD-HA-x86_64-hacluster-alpha-node1@64bit fails in
drbd
Reproducible¶
Fails since (at least) Build 0137@0348
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 7 years ago
I don't know where the tests came from. I don't know who enabled them but putting broken scenarios into the validation job group is not the right approach. I removed again hacluster-alpha-node1, hacluster-alpha-node2 and hacluster-supportserver from the validation job group and added them to the test development group
Updated by RBrownSUSE over 7 years ago
HA tests are the responsibility of QA HA..I don't think you should be changing their job group without discussion with them
Updated by okurz over 7 years ago
Ok.
@hsehic: do you read this? I would like to discuss with you where we can help you with review.
Updated by asmorodskyi over 7 years ago
- Subject changed from test fails in drbd to [ha][scc]test fails in drbd
Currently SLE Functional doing review of HA test. I think we should choose one of two :
- Keep having a power to add/remove tests from HA group and keep doing the review of HA group
- Stop doing the review of HA tests on regular basis and then we can leave a power of add/remove tests to a new person(s) responsible for review of HA tests.
All other combinations does not make any sense for me
Updated by hsehic over 7 years ago
Hi Oli,
okurz wrote:
Ok.
@hsehic: do you read this? I would like to discuss with you where we can help you with review.
Yes, I read this, and thanks for the offer. Please ping me as soon as you have some spear time
Updated by okurz over 7 years ago
jobs are somehow ending up incomplete because multiple hacluster-supportserver are spawned in parallel. The test suites alpha-node1 and alpha-node2 are not specified to be running in parallel with each other. I don't get why. I am adding this to test suites. Let's see.
alpha-node1: PARALLEL_WITH=hacluster-supportserver,hacluster-alpha-node2
alpha-node2: PARALLEL_WITH=hacluster-supportserver,hacluster-alpha-node1
Triggered with
$ openqa_client_osd isos post ARCH=x86_64 BETA=1 _NOOBSOLETEBUILD=1 BUILD=0165@0396 BUILD_HA=0165 BUILD_HA_GEO=0127 BUILD_SDK=0210 BUILD_SLE=0396 BUILD_WE=0124 DISTRI=sle FLAVOR=Server-DVD-HA MACHINE=64bit REPO_0=SLE-12-SP3-Server-DVD-x86_64-Build0396-Media1 SCC_REGCODE=30452ce234918d23 SCC_REGCODE_HA=223378a848e109bd SCC_URL=http://Server-0396.HA-0165.proxy.scc.suse.de VERSION=12-SP3 ISO=SLE-12-SP3-Server-DVD-x86_64-Build0396-Media1.iso ISO_2=SLE-12-SP3-HA-DVD-x86_64-Build0165-Media1.iso TEST=hacluster-supportserver,hacluster-alpha-node1,hacluster-alpha-node2
{ count => 4, failed => [], ids => [952379 .. 952382] }
Updated by okurz over 7 years ago
triggered at the same time as it seems but very unstable, failing often, https://openqa.suse.de/tests/952778 is one node job incomplete, without any logs, supportserver is still running, very broken. worker logs say:
May 23 14:25:17 openqaworker3 worker[3195]: [INFO] 9505: WORKING 952778
May 23 14:30:27 openqaworker3 worker[3195]: [ERROR] 404 response: Not Found (remaining tries: 0)
May 23 14:30:27 openqaworker3 worker[3195]: [ERROR] Job aborted because web UI doesn't accept updates anymore (likely considers this job dead)
May 23 14:30:28 openqaworker3 worker[3195]: [ERROR] 404 response: Not Found (remaining tries: 0)
May 23 14:30:28 openqaworker3 worker[3195]: killed 9505
May 23 14:31:09 openqaworker3 worker[3195]: [DEBUG] Either there is no job running or we were asked to stop: (1|Reason: api-failure)
May 23 14:31:09 openqaworker3 worker[3195]: [INFO] cleaning up 00952778-sle-12-SP3-Server-DVD-HA-x86_64-Build0165@0396-hacluster-alpha-node2@64bit
so broken.
Also jobs like https://openqa.suse.de/tests/952803/file/autoinst-log.txt incompleting are not really helpful and I recommend to improve the backend handling of this at first before wasting time into investigating incompletes with incomplete logs all the time.
Updated by ldevulder almost 7 years ago
- Status changed from New to Workable
- Assignee changed from hsehic to ldevulder
- % Done changed from 0 to 100
HA tests have been rewritten for SLE12-SP2+ and SLE15, they are now functional.
Change Assignee, as I do most of the rewrite.
Updated by ldevulder almost 7 years ago
- Status changed from Workable to In Progress