action #116191: [qe-sap] HA Migration Verification from 15-SP3 jobs fail in check_after_reboot - openQA Tests - openSUSE Project Management Tool

Actions

Copy link

action #116191

open

[qe-sap] HA Migration Verification from 15-SP3 jobs fail in check_after_reboot

Added by acarvajal almost 2 years ago. Updated 3 months ago.

Status:

Feedback

Priority:

Normal

Assignee:

acarvajal

Category:

Bugs in existing tests

Target version:

Start date:

2022-09-02

Due date:

% Done:

100%

Estimated time:

Difficulty:

Description

Observation¶

openQA test in scenario sle-15-SP5-Online-ppc64le-migration_online_verify_sle15sp3_ha_alpha_node01@ppc64le fails in
check_after_reboot

Test suite description¶

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible¶

Fails since (at least) Build 19.1

Further details¶

Always latest result in this scenario: latest

Actions

Copy link

Updated by acarvajal almost 2 years ago

Verifying the state of the cluster in the hb_report shows that cluster_md resource failed to start because /etc/md/cluster_md was not present in the system.

Jobs were restarted and taken control of via development mode of openQA to verify the state of cluster_md related resources:

/etc/md/cluster_md was not present, as reported by journal and hb_report
Backing iSCSI devices were connected.
/etc/mdadm.conf was present, and it included in the DEVICE line the paths of 2 iSCSI devices that existed in the system.
A call to mdadm --assemble --scan failed with retval 2 but no error messages.
A call to mdadm --assemble --scan --verbose reported that /etc/md/cluster_md RAID could not be initialized because configured UUID was different to the actual UUID.
Checking with blkid confirmed that the UUID reported by both devices in the MD RAID was different than the one configured in /etc/mdadm.conf

Issue was fixed by:

Updating the UUID in /etc/mdadm.conf with the one reported by blkid
Syncing /etc/mdadm/.conf to the other node with csync2
Cleaning up the cluster_md resource with crm resource cleanup cluster_md

This can be used as the basis for a workaround in the ha/check_after_reboot test module.

Actions

Copy link

Updated by acarvajal almost 2 years ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15473

Actions

Copy link

Updated by acarvajal almost 2 years ago

Subject changed from HA Migration Verification from 15-SP3 jobs fail in check_after_reboot to [qe-sap] HA Migration Verification from 15-SP3 jobs fail in check_after_reboot

Actions

Copy link

Updated by acarvajal almost 2 years ago

PR#15473 merged. Closing this.

Actions

Copy link

Updated by acarvajal almost 2 years ago

Status changed from New to Resolved
% Done changed from 0 to 100

Actions

Copy link

Updated by acarvajal almost 2 years ago

The PR introduced some regressions. The following 2 PRs were submitted to fix:

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15479
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15484

Actions

Copy link

Updated by openqa_review almost 2 years ago

Status changed from Resolved to Feedback

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_online_verify_sle15sp3_ha_alpha_node01_atmg
https://openqa.suse.de/tests/9517602

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Copy link

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_online_verify_sle15sp3_ha_alpha_node01_atmg
https://openqa.suse.de/tests/9918670

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Actions

Copy link

Updated by openqa_review about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_scc_verify_sle15sp3_ha_alpha_node01
https://openqa.suse.de/tests/11030639

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 200 days if nothing changes in this ticket.

Actions

Copy link

#10

Updated by openqa_review 3 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_dvd_verify_sle15sp3_ha_alpha_node01
https://openqa.suse.de/tests/13956827

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 696 days if nothing changes in this ticket.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA » openQA Project » openQA Tests

Tags

Custom queries

action #116191

[qe-sap] HA Migration Verification from 15-SP3 jobs fail in check_after_reboot

Observation¶

Test suite description¶

Reproducible¶

Further details¶

Updated by acarvajal almost 2 years ago

Updated by acarvajal almost 2 years ago

Updated by acarvajal almost 2 years ago

Updated by acarvajal almost 2 years ago

Updated by acarvajal almost 2 years ago

Updated by acarvajal almost 2 years ago

Updated by openqa_review almost 2 years ago

Updated by openqa_review over 1 year ago

Updated by openqa_review about 1 year ago

Updated by openqa_review 3 months ago