[qe-asg][HAE] Test fails in cluster_md: unexpected node fence
openQA test fails while configuring a cluster_md resource in a MM test environment:
https://openqa.suse.de/tests/5069278#step/cluster_md/25 (alpha cluster on x86_64)
https://openqa.suse.de/tests/5067522#step/cluster_md/25 (alpha cluster on ppc64le)
https://openqa.suse.de/tests/5066189#step/cluster_md/25 (delta cluster on ppc64le)
https://openqa.suse.de/tests/5082270#step/cluster_md/25 (delta cluster on x86_64)
https://openqa.suse.de/tests/5066990#step/cluster_md/25 (gamma cluster on x86_64)
Test Suite Description.¶
This is affecting test suites for:
- alpha cluster: 2 node cluster, with SBD, DLM, lvmlockd, cluster_md+OCFS2, drbd_passive+XFS, fencing test with sysrq and remove_node
- delta cluster: 3 node cluster, with diskless SBD, DLM, lvmlockd, cluster_md+OCFS2, fencing test with crmsh
- gamma cluster: 3 node cluster, with SBD, DLM, lvmlockd, cluster_md+OCFS2, fencing test with crmsh
Sporacic failures on alpha, delta and gamma tests over x86_64 and ppc64le with qemu backend.
Fails since 15-SP3 Build 51.1, but issue is sporadic. Test can pass the cluster_md test module after re-triggering.
Alpha Cluster in x86_64: https://openqa.suse.de/tests/5070248 & https://openqa.suse.de/tests/5070249
Alpha Cluster in ppc64le: https://openqa.suse.de/tests/5070260 & https://openqa.suse.de/tests/5070261
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: ha_beta_node02
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
- Status changed from New to Resolved
I don't know why, but this time, I was able to reproduce the bug on my lab, thanks to this, it was easier to debug.
There was a conflict when the cluster md resource agent was configured whereas the md device was already started in the nodes. I removed the manual start of the md device from the test and after checking the official HA documentation, this is how it should be done.