Project

General

Profile

action #80532

[qe-asg][HAE] Test fails in cluster_md: unexpected node fence

Added by acarvajal 8 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2020-11-27
Due date:
% Done:

0%

Estimated time:
Difficulty:
hard

Description

Observation

openQA test fails while configuring a cluster_md resource in a MM test environment:

https://openqa.suse.de/tests/5069278#step/cluster_md/25 (alpha cluster on x86_64)
https://openqa.suse.de/tests/5067522#step/cluster_md/25 (alpha cluster on ppc64le)
https://openqa.suse.de/tests/5066189#step/cluster_md/25 (delta cluster on ppc64le)
https://openqa.suse.de/tests/5082270#step/cluster_md/25 (delta cluster on x86_64)
https://openqa.suse.de/tests/5066990#step/cluster_md/25 (gamma cluster on x86_64)

Test Suite Description.

This is affecting test suites for:

  • alpha cluster: 2 node cluster, with SBD, DLM, lvmlockd, cluster_md+OCFS2, drbd_passive+XFS, fencing test with sysrq and remove_node
  • delta cluster: 3 node cluster, with diskless SBD, DLM, lvmlockd, cluster_md+OCFS2, fencing test with crmsh
  • gamma cluster: 3 node cluster, with SBD, DLM, lvmlockd, cluster_md+OCFS2, fencing test with crmsh

Maintainer: QA-ASG

Impacted Architectures

Sporacic failures on alpha, delta and gamma tests over x86_64 and ppc64le with qemu backend.

Reproducible

Fails since 15-SP3 Build 51.1, but issue is sporadic. Test can pass the cluster_md test module after re-triggering.

Expected Result

Alpha Cluster in x86_64: https://openqa.suse.de/tests/5070248 & https://openqa.suse.de/tests/5070249
Alpha Cluster in ppc64le: https://openqa.suse.de/tests/5070260 & https://openqa.suse.de/tests/5070261

History

#1 Updated by okurz 8 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ha_beta_node02
https://openqa.suse.de/tests/5179022

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#2 Updated by jadamek 7 months ago

  • Status changed from New to Resolved

I don't know why, but this time, I was able to reproduce the bug on my lab, thanks to this, it was easier to debug.
There was a conflict when the cluster md resource agent was configured whereas the md device was already started in the nodes. I removed the manual start of the md device from the test and after checking the official HA documentation, this is how it should be done.

Also available in: Atom PDF