Project

General

Profile

action #116191

[qe-sap] HA Migration Verification from 15-SP3 jobs fail in check_after_reboot

Added by acarvajal 3 months ago. Updated 24 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2022-09-02
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP5-Online-ppc64le-migration_online_verify_sle15sp3_ha_alpha_node01@ppc64le fails in
check_after_reboot

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Fails since (at least) Build 19.1

Further details

Always latest result in this scenario: latest

History

#1 Updated by acarvajal 3 months ago

Verifying the state of the cluster in the hb_report shows that cluster_md resource failed to start because /etc/md/cluster_md was not present in the system.

Jobs were restarted and taken control of via development mode of openQA to verify the state of cluster_md related resources:

  • /etc/md/cluster_md was not present, as reported by journal and hb_report
  • Backing iSCSI devices were connected.
  • /etc/mdadm.conf was present, and it included in the DEVICE line the paths of 2 iSCSI devices that existed in the system.
  • A call to mdadm --assemble --scan failed with retval 2 but no error messages.
  • A call to mdadm --assemble --scan --verbose reported that /etc/md/cluster_md RAID could not be initialized because configured UUID was different to the actual UUID.
  • Checking with blkid confirmed that the UUID reported by both devices in the MD RAID was different than the one configured in /etc/mdadm.conf

Issue was fixed by:

  • Updating the UUID in /etc/mdadm.conf with the one reported by blkid
  • Syncing /etc/mdadm/.conf to the other node with csync2
  • Cleaning up the cluster_md resource with crm resource cleanup cluster_md

This can be used as the basis for a workaround in the ha/check_after_reboot test module.

#3 Updated by acarvajal 3 months ago

  • Subject changed from HA Migration Verification from 15-SP3 jobs fail in check_after_reboot to [qe-sap] HA Migration Verification from 15-SP3 jobs fail in check_after_reboot

#4 Updated by acarvajal 3 months ago

PR#15473 merged. Closing this.

#5 Updated by acarvajal 3 months ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

#7 Updated by openqa_review 2 months ago

  • Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_online_verify_sle15sp3_ha_alpha_node01_atmg
https://openqa.suse.de/tests/9517602

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

#8 Updated by openqa_review 24 days ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_online_verify_sle15sp3_ha_alpha_node01_atmg
https://openqa.suse.de/tests/9918670

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Also available in: Atom PDF