action #19044

[ha][scc]test fails in drbd

Added by mkravec over 2 years ago. Updated almost 2 years ago.

Status:ResolvedStart date:09/05/2017
Priority:NormalDue date:
Assignee:ldevulder% Done:

100%

Category:Bugs in existing tests
Target version:-
Difficulty:
Duration:

Description

drbd and crm_mon tests fail and there are no successful runs in test history.
Please fix or move to development group.

Observation

openQA test in scenario sle-12-SP3-Server-DVD-HA-x86_64-hacluster-alpha-node1@64bit fails in
drbd

Reproducible

Fails since (at least) Build 0137@0348

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by okurz over 2 years ago

I don't know where the tests came from. I don't know who enabled them but putting broken scenarios into the validation job group is not the right approach. I removed again hacluster-alpha-node1, hacluster-alpha-node2 and hacluster-supportserver from the validation job group and added them to the test development group

#2 Updated by RBrownSUSE over 2 years ago

HA tests are the responsibility of QA HA..I don't think you should be changing their job group without discussion with them

#3 Updated by okurz over 2 years ago

Ok.

@hsehic: do you read this? I would like to discuss with you where we can help you with review.

#4 Updated by asmorodskyi over 2 years ago

  • Subject changed from test fails in drbd to [ha][scc]test fails in drbd

Currently SLE Functional doing review of HA test. I think we should choose one of two :

  1. Keep having a power to add/remove tests from HA group and keep doing the review of HA group
  2. Stop doing the review of HA tests on regular basis and then we can leave a power of add/remove tests to a new person(s) responsible for review of HA tests.

All other combinations does not make any sense for me

#5 Updated by hsehic over 2 years ago

Hi Oli,
okurz wrote:

Ok.


@hsehic: do you read this? I would like to discuss with you where we can help you with review.

Yes, I read this, and thanks for the offer. Please ping me as soon as you have some spear time

#6 Updated by okurz over 2 years ago

jobs are somehow ending up incomplete because multiple hacluster-supportserver are spawned in parallel. The test suites alpha-node1 and alpha-node2 are not specified to be running in parallel with each other. I don't get why. I am adding this to test suites. Let's see.

alpha-node1: PARALLEL_WITH=hacluster-supportserver,hacluster-alpha-node2
alpha-node2: PARALLEL_WITH=hacluster-supportserver,hacluster-alpha-node1

Triggered with

$ openqa_client_osd isos post ARCH=x86_64 BETA=1 _NOOBSOLETEBUILD=1 BUILD=0165@0396 BUILD_HA=0165 BUILD_HA_GEO=0127 BUILD_SDK=0210 BUILD_SLE=0396 BUILD_WE=0124 DISTRI=sle FLAVOR=Server-DVD-HA MACHINE=64bit REPO_0=SLE-12-SP3-Server-DVD-x86_64-Build0396-Media1 SCC_REGCODE=30452ce234918d23 SCC_REGCODE_HA=223378a848e109bd SCC_URL=http://Server-0396.HA-0165.proxy.scc.suse.de  VERSION=12-SP3 ISO=SLE-12-SP3-Server-DVD-x86_64-Build0396-Media1.iso ISO_2=SLE-12-SP3-HA-DVD-x86_64-Build0165-Media1.iso TEST=hacluster-supportserver,hacluster-alpha-node1,hacluster-alpha-node2
{ count => 4, failed => [], ids => [952379 .. 952382] }

#7 Updated by okurz over 2 years ago

triggered at the same time as it seems but very unstable, failing often, https://openqa.suse.de/tests/952778 is one node job incomplete, without any logs, supportserver is still running, very broken. worker logs say:

May 23 14:25:17 openqaworker3 worker[3195]: [INFO] 9505: WORKING 952778
May 23 14:30:27 openqaworker3 worker[3195]: [ERROR] 404 response: Not Found (remaining tries: 0)
May 23 14:30:27 openqaworker3 worker[3195]: [ERROR] Job aborted because web UI doesn't accept updates anymore (likely considers this job dead)
May 23 14:30:28 openqaworker3 worker[3195]: [ERROR] 404 response: Not Found (remaining tries: 0)
May 23 14:30:28 openqaworker3 worker[3195]: killed 9505
May 23 14:31:09 openqaworker3 worker[3195]: [DEBUG] Either there is no job running or we were asked to stop: (1|Reason: api-failure)
May 23 14:31:09 openqaworker3 worker[3195]: [INFO] cleaning up 00952778-sle-12-SP3-Server-DVD-HA-x86_64-Build0165@0396-hacluster-alpha-node2@64bit

so broken.

Also jobs like https://openqa.suse.de/tests/952803/file/autoinst-log.txt incompleting are not really helpful and I recommend to improve the backend handling of this at first before wasting time into investigating incompletes with incomplete logs all the time.

#8 Updated by ldevulder almost 2 years ago

  • Status changed from New to Workable
  • Assignee changed from hsehic to ldevulder
  • % Done changed from 0 to 100

HA tests have been rewritten for SLE12-SP2+ and SLE15, they are now functional.
Change Assignee, as I do most of the rewrite.

#9 Updated by ldevulder almost 2 years ago

  • Status changed from Workable to In Progress

#10 Updated by ldevulder almost 2 years ago

  • Status changed from In Progress to Resolved

Resolved!

Also available in: Atom PDF