Project

General

Profile

Actions

action #150995

open

Fix MM setup on diesel so test scenarios like ha_ctdb_supportservertest-ppc-mm work

Added by mkittler about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2023-11-17
Due date:
% Done:

0%

Estimated time:

Description

Observation

The test scenario ha_ctdb_supportservertest-ppc-mm and its parallel children fail when scheduled across diesel and another PowerPC worker like mania. The scenario works when scheduled across petrol and mania so supposedly the host diesel is the culprit.

The problematic test module was iscsi_client in the child job running on mania (ha_ctdb_node02test-ppc-mm when reproducing via the clone command mentioned below). After installing http://download.opensuse.org/update/leap/15.3/sle/ppc64le/kernel-default-5.3.18-150300.59.93.1.ppc64le.rpm the failing module is ha_cluster_join where ping -c1 ctdb-node01 fails with ping: ctdb-node01: Temporary failure in name resolution or a similar error. At least that is how it looks like at the time of creating this ticket in three consecutive test runs.

Steps to reproduce

  1. Ensure the MM setup on diesel is up-to-date. Since the machine might not have the tap worker class (due to the problem we're trying to solve here) salt might not have updated its configuration. Ensure that there is at least a valid GRE tunnel config to the other PowerPC workers.
  2. Goto https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online&machine=ppc64le-2g&test=ha_ctdb_supportserver&version=15-SP6 to find the latest job in the mentioned scenario. Possibly remove query parameters if the page shows no recent result.
  3. Invoke sudo openqa-clone-job --skip-download --export-command --skip-chained-deps https://openqa.suse.de/tests/12799476 {TEST,BUILD}+='test-ppc-mm' _GROUP=0 on OSD.
  4. Replace the first two worker class variable values with diesel and the other with mania.
  5. Follow the test execution of the four jobs in the cluster.

Acceptance criteria

  • AC1: The host diesel is able to run MM jobs when some jobs of the cluster are running on other hosts, especially the mentioned scenario works.
  • AC2: The tap worker class is configured for diesel in workerconf.sls.

Suggestions and remarks

  • See #136130 for previous work in that area. The steps to reproduce mentioned is basically my investigation for that ticket I mentioned in #136130#note-49 and further comments.
  • Try to run a more basic MM test scenario. Maybe it is possible to port the basic wicked scenario to PowerPC.
  • There's an ongoing discussion with Dirk on Slack. I will update this ticket with our findings.

Related issues 2 (1 open1 closed)

Related to openQA Tests (public) - action #136130: test fails in iscsi_client due to salt 'host'/'nodename' confusion size:MResolvedmkittler2023-09-20

Actions
Copied to openQA Infrastructure (public) - action #151606: Ensure that the tap class is fully enabled on petrolNew2023-11-17

Actions
Actions #1

Updated by okurz about 1 year ago

  • Target version set to future
Actions #2

Updated by okurz about 1 year ago

  • Related to action #136130: test fails in iscsi_client due to salt 'host'/'nodename' confusion size:M added
Actions #3

Updated by okurz about 1 year ago

  • Copied to action #151606: Ensure that the tap class is fully enabled on petrol added
Actions

Also available in: Atom PDF