action #19044
closed
[ha][scc]test fails in drbd
Added by mkravec over 7 years ago.
Updated almost 7 years ago.
Category:
Bugs in existing tests
Description
drbd and crm_mon tests fail and there are no successful runs in test history.
Please fix or move to development group.
Observation¶
openQA test in scenario sle-12-SP3-Server-DVD-HA-x86_64-hacluster-alpha-node1@64bit fails in
drbd
Reproducible¶
Fails since (at least) Build 0137@0348
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
I don't know where the tests came from. I don't know who enabled them but putting broken scenarios into the validation job group is not the right approach. I removed again hacluster-alpha-node1, hacluster-alpha-node2 and hacluster-supportserver from the validation job group and added them to the test development group
HA tests are the responsibility of QA HA..I don't think you should be changing their job group without discussion with them
Ok.
@hsehic: do you read this? I would like to discuss with you where we can help you with review.
- Subject changed from test fails in drbd to [ha][scc]test fails in drbd
Currently SLE Functional doing review of HA test. I think we should choose one of two :
- Keep having a power to add/remove tests from HA group and keep doing the review of HA group
- Stop doing the review of HA tests on regular basis and then we can leave a power of add/remove tests to a new person(s) responsible for review of HA tests.
All other combinations does not make any sense for me
Hi Oli,
okurz wrote:
Ok.
@hsehic: do you read this? I would like to discuss with you where we can help you with review.
Yes, I read this, and thanks for the offer. Please ping me as soon as you have some spear time
jobs are somehow ending up incomplete because multiple hacluster-supportserver are spawned in parallel. The test suites alpha-node1 and alpha-node2 are not specified to be running in parallel with each other. I don't get why. I am adding this to test suites. Let's see.
alpha-node1: PARALLEL_WITH=hacluster-supportserver,hacluster-alpha-node2
alpha-node2: PARALLEL_WITH=hacluster-supportserver,hacluster-alpha-node1
Triggered with
$ openqa_client_osd isos post ARCH=x86_64 BETA=1 _NOOBSOLETEBUILD=1 BUILD=0165@0396 BUILD_HA=0165 BUILD_HA_GEO=0127 BUILD_SDK=0210 BUILD_SLE=0396 BUILD_WE=0124 DISTRI=sle FLAVOR=Server-DVD-HA MACHINE=64bit REPO_0=SLE-12-SP3-Server-DVD-x86_64-Build0396-Media1 SCC_REGCODE=30452ce234918d23 SCC_REGCODE_HA=223378a848e109bd SCC_URL=http://Server-0396.HA-0165.proxy.scc.suse.de VERSION=12-SP3 ISO=SLE-12-SP3-Server-DVD-x86_64-Build0396-Media1.iso ISO_2=SLE-12-SP3-HA-DVD-x86_64-Build0165-Media1.iso TEST=hacluster-supportserver,hacluster-alpha-node1,hacluster-alpha-node2
{ count => 4, failed => [], ids => [952379 .. 952382] }
triggered at the same time as it seems but very unstable, failing often, https://openqa.suse.de/tests/952778 is one node job incomplete, without any logs, supportserver is still running, very broken. worker logs say:
May 23 14:25:17 openqaworker3 worker[3195]: [INFO] 9505: WORKING 952778
May 23 14:30:27 openqaworker3 worker[3195]: [ERROR] 404 response: Not Found (remaining tries: 0)
May 23 14:30:27 openqaworker3 worker[3195]: [ERROR] Job aborted because web UI doesn't accept updates anymore (likely considers this job dead)
May 23 14:30:28 openqaworker3 worker[3195]: [ERROR] 404 response: Not Found (remaining tries: 0)
May 23 14:30:28 openqaworker3 worker[3195]: killed 9505
May 23 14:31:09 openqaworker3 worker[3195]: [DEBUG] Either there is no job running or we were asked to stop: (1|Reason: api-failure)
May 23 14:31:09 openqaworker3 worker[3195]: [INFO] cleaning up 00952778-sle-12-SP3-Server-DVD-HA-x86_64-Build0165@0396-hacluster-alpha-node2@64bit
so broken.
Also jobs like https://openqa.suse.de/tests/952803/file/autoinst-log.txt incompleting are not really helpful and I recommend to improve the backend handling of this at first before wasting time into investigating incompletes with incomplete logs all the time.
- Status changed from New to Workable
- Assignee changed from hsehic to ldevulder
- % Done changed from 0 to 100
HA tests have been rewritten for SLE12-SP2+ and SLE15, they are now functional.
Change Assignee, as I do most of the rewrite.
- Status changed from Workable to In Progress
- Status changed from In Progress to Resolved
Also available in: Atom
PDF