Project

General

Profile

Actions

action #54764

closed

[ha][migration] migration_verify_sle11sp4_ha_alpha_node02 fails in check_after_reboot

Added by acarvajal almost 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2019-07-29
Due date:
% Done:

100%

Estimated time:
Difficulty:

Description

Starting around 08.07.2019 and 10.07.2019, HA migration tests from 11-SP4 to 12-SP5 (tests migration_verify_sle11sp4_ha_alpha_node01 and migration_verify_sle11sp4_ha_alpha_node02) are failing due to VMs restart after cluster is started on the cluster nodes.

Due to the step where the test is failing, it has not been possible to gather logs so far. Still working on a way to accomplish this.

Jobs previous to that date were successfully completed:

Node1 Upgrade: https://openqa.suse.de/tests/3041853
Node2 Upgrade: https://openqa.suse.de/tests/3041206
Node1: https://openqa.suse.de/tests/3041856
Node2: https://openqa.suse.de/tests/3041855
Support Server: https://openqa.suse.de/tests/3041854

However, after that date, the Node Upgrade tests still work:

Node1 Upgrade: https://openqa.suse.de/tests/3161942
Node2 Upgrade: https://openqa.suse.de/tests/3161939

While the cluster tests fail with the problem described above:

Node1: https://openqa.suse.de/tests/3161945
Node2: https://openqa.suse.de/tests/3161944
Support Server: https://openqa.suse.de/tests/3161943

(In the video for Node1 it can also be seen that this node is also restarting)

This issue is only affecting the 11-SP4 to 12-SP5 HA migration scenario. All other migration scenarios (12-SP2-LTSS, 12-SP3-LTSS, 12-SP4, etc.) are working.

Do not think this is related to the build itself, as the same test in our development environment is working using the qcow2 images generated by openqa.suse.de's upgrade tests:

Node1: http://mango.suse.de/tests/1169
Node2: http://mango.suse.de/tests/1170
Support Server: http://mango.suse.de/tests/1168

Also performed manual testing, and was not able to reproduce the issue.

Think this could be related to the upgrade of the x86_64 workers to Leap 15.1 as the workers in our development environment are using SLES 12-SP3.

Could not find the exact date of the migration to Leap 15.1 in: https://confluence.suse.com/pages/viewpage.action?pageId=194052156, but think it was on 09.07.2019.

Actions

Also available in: Atom PDF