action #154552: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #154552

closed

coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

[ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de

Added by acarvajal over 1 year ago. Updated over 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Support

Target version:

Ready

Start date:

2024-01-30

Due date:

% Done:

Estimated time:

Tags:

multi-machine, infra

Description

Observation¶

openQA test in scenario sle-15-SP6-Online-ppc64le-SAPHanaSR_ScaleUp_PerfOpt_WMP_node01@ppc64le-sap fails in
iscsi_client

Other MM jobs in ppc64le in the job group also failed:

https://openqa.suse.de/tests/13381522#step/iscsi_client/9

But failure seems to be limited to ppc64le as equivalent x86_64 jobs cleared this step:

https://openqa.suse.de/tests/13382300 & https://openqa.suse.de/tests/13382301
https://openqa.suse.de/tests/13382303 & https://openqa.suse.de/tests/13382304

(Those fail later in an unrelated bsc#)

Recommendation is to investigate if something changed or if there is something wrong on qemu_ppc64le-large-mem workers, as HA jobs in the same build in ppc64le were able to clear that test module and in some cases pass completely:

Alpha Cluster: https://openqa.suse.de/tests/13364670 & https://openqa.suse.de/tests/13364672 (passes)
Beta Cluster: https://openqa.suse.de/tests/13364675 & https://openqa.suse.de/tests/13364678 (fails later in filesystem module)
(There are other examples in https://openqa.suse.de/tests/overview?distri=sle&version=15-SP6&build=50.1&groupid=143)

Reproducible¶

Fails since (at least) Build 50.1

Same test with same build but 3 days ago did not show this issue: https://openqa.suse.de/tests/13364664

Further details¶

Always latest result in this scenario: latest

Related issues 4 (1 open — 3 closed)

Related to openQA Tests (public) - action #95788: [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network

Feedback

2021-07-21

Actions

Related to openQA Project (public) - action #153769: Better handle changes in GRE tunnel configuration size:M

Resolved

okurz

2024-01-17

Actions

Related to openQA Project (public) - action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M

Resolved

mkittler

2023-12-11

Actions

Copied to openQA Infrastructure (public) - action #154624: Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M

Resolved

jbaier_cz

2024-01-30

Actions

Copy link

Updated by acarvajal over 1 year ago

Related to action #95788: [qe-sap][ha][shap] test fails in iscsi_client or other modules in HA tests, missing network added

Actions

Copy link

Updated by mkittler over 1 year ago

The gre configuration on petrol and mania looks generally good. Maybe it still doesn't work in practice, though.

In all the failures I've seen the support server ran on mania and the other jobs on petrol so the gre tunnel setup may be relevant here. I suppose one could cross-check with a simple scenario or a VM as documented on https://open.qa/docs/#_verify_the_setup. Not sure whether e.g. the ping_… scenario works on ppc64le at all, though.

Actions

Copy link

Updated by mkittler over 1 year ago

Assignee set to mkittler

Actions

Copy link

Updated by mkittler over 1 year ago · Edited

I assigned myself for some initial investigation so we have something to work with when estimating the ticket.

As first step I created a test cluster for testing the gre connection: openqa-clone-job --skip-chained-deps --within-instance https://openqa.suse.de/tests/13366259 _GROUP=0 BUILD+=-gre-test-for-poo-154552 WORKER_CLASS:wicked_basic_ref+=,mania WORKER_CLASS:wicked_basic_sut+=,petrol

2 jobs have been created:

sle-15-SP6-Online-ppc64le-Build50.1-wicked_basic_ref@ppc64le -> https://openqa.suse.de/tests/13382433
sle-15-SP6-Online-ppc64le-Build50.1-wicked_basic_sut@ppc64le -> https://openqa.suse.de/tests/13382434

Actions

Copy link

Updated by okurz over 1 year ago

Tags set to infra, multi-machine
Project changed from openQA Infrastructure (public) to openQA Project (public)
Category set to Support
Priority changed from Normal to High
Target version set to Ready

Actions

Copy link

Updated by okurz over 1 year ago

Related to action #153769: Better handle changes in GRE tunnel configuration size:M added

Actions

Copy link

Updated by okurz over 1 year ago

Related to action #152389: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other multi-machine scenarios due to MTU size auto_review:"ping with packet size 1350 failed, problems with MTU" size:M added

Actions

Copy link

Updated by okurz over 1 year ago

Parent task set to #111929

Actions

Copy link

Updated by okurz over 1 year ago

Subject changed from test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de to [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de

Actions

Copy link

#10

Updated by mkittler over 1 year ago · Edited

It worked again after rebooting petrol and mania: https://openqa.suse.de/tests/13382492

Before the wicked test scenario also didn't work and just running the preup script again (on both workers) to delete and add back the gre tunnel connection didn't help as well.

The scenario mentioned in the ticket description works again as well: https://openqa.suse.de/tests/13382503
(Although in this test run all jobs ran only on petrol.)

Actions

Copy link

#11

Updated by okurz over 1 year ago

That leaves the question open what caused this as we didn't have any intended changes in the GRE structure lately, did we?

Also next time we see such problem we can also try the following alternatives before resorting to rebooting:

wicked ifup all
systemctl restart network

Actions

Copy link

#12

Updated by okurz over 1 year ago

Status changed from New to Resolved

Problem is gone. We will follow-up in related tickets e.g. #153769

Actions

Copy link

#13

Updated by okurz over 1 year ago

Copied to action #154624: Periodically running simple ping-check multi-machine tests on x86_64 covering multiple physical hosts on OSD alerting tools team on failures size:M added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #154552

[ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: openqa.suse.de

Observation¶

Reproducible¶

Further details¶

Updated by acarvajal over 1 year ago

Updated by mkittler over 1 year ago

Updated by mkittler over 1 year ago

Updated by mkittler over 1 year ago · Edited

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by mkittler over 1 year ago · Edited

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago

Updated by okurz over 1 year ago