action #96833
closed[sap] test fails in hana_install auto_review:"(?s)tests/sles4sap/hana_install.*Test died: poo#96833 - locked block device"
100%
Description
Observation¶
openQA test in scenario sle-15-SP1-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm@64bit-ipmi-nvdimm fails in
hana_install
Reproducible¶
Failure is sporadic, but it only impacts tests that schedule the sles4sap/hana_install
test module on the MACHINE=64bit-ipmi-nvdimm
identified by IPMI_HOSTNAME=sp.holmes.qa.suse.de
.
This system has one 465G SATA HDD, which needs to be re-partitioned before installing HANA to ensure there is enough free space for the HANA installation (2.5 times the amount of RAM).
As can be seen on https://openqa.suse.de/tests/6797823#next_previous test is able to complete successfully some times, while others it is unable to successfully clear up all LVM structures from the previous run resulting in a failure:
# wait_serial expected: "pvremove -f /dev/sda3; echo 2NoOV-\$?-"
# Result:
pvremove -f /dev/sda3; echo 2NoOV-$?-
# wait_serial expected: qr/2NoOV-\d+-/
# Result:
Can't open /dev/sda3 exclusively. Mounted filesystem?
Can't open /dev/sda3 exclusively. Mounted filesystem?
2NoOV-5-
Currently test module is calling wipefs -a
as well as pvremove
, vgremove
, lvremove
and dmsetup remove
commands to clean up the device, and also the boot/boot_from_pxe
test module is calling wipefs -a
before installation, but this seems to not be enough on some test runs.
One idea could be to zero-out the whole disk during boot/boot_from_pxe
, but this could turn out to be very slow due to the disk size.
Expected result¶
Last good: :20736:libarchive (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by acarvajal over 2 years ago
- Assignee set to acarvajal
As a first approach to attempt to gather more information on the state of the LVM configuration in the system when the tests fail, I will submit a PR adding more debugging command on failure.
I will also include the poo# in the die
command in these cases so test is properly identified as failing with this poo.
Updated by acarvajal over 2 years ago
Updated by acarvajal over 2 years ago
Attempted clearing out logical volumes first with dmsetup remove
than with lvremove
, and the last 5 verification runs have all completed successfully:
- https://openqa.suse.de/tests/6868507
- https://openqa.suse.de/tests/6868744
- https://openqa.suse.de/tests/6868749
- https://openqa.suse.de/tests/6868750
- https://openqa.suse.de/tests/6868751
I just cleaned the code in the PR, and will be submitting at least 5 new verification runs to be certain that this is a fix before merging.
Updated by acarvajal over 2 years ago
- Subject changed from test fails in hana_install to test fails in hana_install auto_review:"(?s)tests/sles4sap/hana_install.*Test died: locked block device"
Updated by acarvajal over 2 years ago
- Subject changed from test fails in hana_install auto_review:"(?s)tests/sles4sap/hana_install.*Test died: locked block device" to test fails in hana_install auto_review:"(?s)tests/sles4sap/hana_install.*Test died: poo#96833 - locked block device"
Updated by acarvajal over 2 years ago
- Due date set to 2021-08-31
Seems latest changes have consistently yielded positive results.
Besides the 5 tests from https://progress.opensuse.org/issues/96833?#note-3, we have 5 more tests passing with the code from the pull request:
- https://openqa.suse.de/tests/6884687
- https://openqa.suse.de/tests/6882740
- https://openqa.suse.de/tests/6882738
- https://openqa.suse.de/tests/6882735
- https://openqa.suse.de/tests/6882733
I have removed the WIP label from the pull request and will proceed to merge it.
I have also updated this ticket with an auto_review regexp so the openqa-label-known-issues-and-investigate-hook
script will tag failed tests that match the new error message on L162 of tests/sles4sap/hana_install with this poo#. Currently latest tests that failed with this issue are:
$ env host=openqa.suse.de ./openqa-query-for-job-label poo#96833
6893068|2021-08-19 08:57:42|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6884463|2021-08-18 20:58:23|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6881752|2021-08-18 11:57:59|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6880171|2021-08-18 04:19:42|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6867933|2021-08-17 09:15:57|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:retry||grenache-1
6857549|2021-08-17 05:22:38|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6857533|2021-08-17 01:53:58|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6857525|2021-08-16 12:22:38|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6855972|2021-08-16 07:55:43|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6840844|2021-08-14 01:27:03|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
I will be monitoring for failures with this issue until end of the month before closing the ticket.
Updated by maritawerner over 2 years ago
- Subject changed from test fails in hana_install auto_review:"(?s)tests/sles4sap/hana_install.*Test died: poo#96833 - locked block device" to [sap] test fails in hana_install auto_review:"(?s)tests/sles4sap/hana_install.*Test died: poo#96833 - locked block device"
Updated by acarvajal over 2 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
Since https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13069 was merged, no more jobs have failed on this issue.
openqa-query-for-job-label poo#96833
shows the same results as before:
acarvajal@linux-mkji:~/git/openqa-scripts [master|✔] > env host=openqa.suse.de ./openqa-query-for-job-label poo#96833
6893068|2021-08-19 08:57:42|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6888371|2021-08-19 01:58:33|done|failed|sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6884463|2021-08-18 20:58:23|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6881752|2021-08-18 11:57:59|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6880171|2021-08-18 04:19:42|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6867933|2021-08-17 09:15:57|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:retry||grenache-1
6857549|2021-08-17 05:22:38|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6857533|2021-08-17 01:53:58|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6857525|2021-08-16 12:22:38|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
6855972|2021-08-16 07:55:43|done|failed|qam-sles4sap_online_dvd_gnome_hana_nvdimm||grenache-1
While looking at the job results on the worker https://openqa.suse.de/admin/workers/1264 yields the following results since August 20th (PR was merged on the 19th):
Total Jobs: 151
Passed: 65 (43%)
Obsoleted: 59 (39%)
Incomplete: 4 (3%)
Failed: 23 (15%)
- 7 on installation/add_update_test_repo
- 3 on installation/partitioning_firstdisk
- 3 on sles4sap/hana_test (probably performance-related. Will create a new poo# to handle this)
- 5 on boot/boot_from_pxe (failed connection to IPMI server)
- 1 on installation/welcome (seems also like a connection issue with IPMI server)
- 1 on sles4sap/wizard_hana_install (bsc#1184679)
- 3 on boot/first_boot (new DM screen in 15-SP4)
And if we remove the Obsoleted tests we get:
Total Jobs: 92
Passed: 65 (71%)
Incomplete: 4 (4%)
Failed: 23 (25%)
No failures in hana_install in any of these tests.
Closing this.
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: qam-sles4sap_online_dvd_gnome_hana_nvdimm@64bit-ipmi-nvdimm
https://openqa.suse.de/tests/7431617
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234