Project

General

Profile

Actions

action #95458

open

[qe-sap][ha] SUT reboots unexpectedly, leading to tests failing in HA scenarios auto_review:"(?s)tests/ha.*(command.*timed out|Test died).*match=root-console timed out":retry

Added by acarvajal almost 3 years ago. Updated 5 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
Start date:
2021-07-13
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA tests in HA scenarios (2 or 3 node clusters) fail in different modules due to unexpected reboots
in one or more of the SUTs:

  1. QAM HA qdevice node 1, fails in ha_cluster_init module
  2. QAM HA rolling upgrade migration, node 2, fails in filesystem module
  3. QAM HA hawk/HAProxy node 1, fails in check_after_reboot module
  4. QAM 2 nodes, node 1, fails in ha_cluster_init module

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Issue is very sporadic, and reproducing it is not always possible. Usually, re-triggering the jobs lead to the tests passing.

For example, from the jobs above, re-triggered jobs succeeded:

  1. https://openqa.suse.de/tests/6435165
  2. https://openqa.suse.de/tests/6435380
  3. https://openqa.suse.de/tests/6435384
  4. https://openqa.suse.de/tests/6435389

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#95458

Expected result

  1. Last good: :20349:samba (or more recent)
  2. Last good: :20121:crmsh
  3. Last good: 20321:kernel-ec2
  4. Last good: MR:244261:crmsh

Related issues 3 (1 open2 closed)

Related to openQA Tests - action #95788: [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network Feedback2021-07-21

Actions
Related to openQA Tests - action #94171: [qem][sap] test fails in check_logs about 50% of timesRejected2021-06-17

Actions
Related to openQA Tests - action #97013: [qe-core][qe-yast] test fails in handle_reboot, patch_and_reboot, installationResolved2021-08-17

Actions
Actions

Also available in: Atom PDF