action #104106: [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #104106

closed

[qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S

Added by szarate about 3 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

mkittler

Category:

Target version:

openQA Project (public) - Ready

Start date:

2021-12-16

Due date:

% Done:

Estimated time:

Description

Observation¶

Long story short since few builds, await_install is failing sporadically, but does work after retriggering the test... looking a bit closer, the test seems to be getting on the 30ish minute threshold of the timeout, and going strong, but shy of few minutes to finish...

openQA test in scenario sle-15-SP4-Online-ppc64le-create_hdd_gnome@ppc64le-2g fails in
await_install

Test suite description¶

image creation job used as parent for other jobs testing based on existing installation. To be used as START_AFTER_TEST=create_hdd_gnome

Reproducible¶

Fails since (at least) Build 74.1 (current job)

Expected result¶

Last good: 70.1 (or more recent)

Further details¶

Always latest result in this scenario: latest

Suggestions¶

Increase TIMEOUT_SCALE, and also MAX_SETUP_TIME

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by maritawerner about 3 years ago

@szarate a ticket for the qe tools team? for the power backend?

Actions

Copy link

Updated by szarate about 3 years ago

Project changed from openQA Tests (public) to openQA Infrastructure (public)
Subject changed from test fails in await_install - Network peformace for ppc installations is decreasing to [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing
Description updated (diff)
Category deleted (~~Bugs in existing tests~~)

Created https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13885 for the time being I'll keep an eye on the problem to see how does this behave in the future, but likely somebody will have to take a look to figure out where's the network bottleneck...

Adding Oliver as a watcher to see if he has any suggestions

Actions

Copy link

Updated by szarate about 3 years ago

@maritawerner wrote:

@szarate a ticket for the qe tools team? for the power backend?

Whoever has to take care of the infraestructure and network... but for now I'm leaving it within qe-core, but in the infraestructure project, as this has little to do with tests but rather with hardware per se.

Actions

Copy link

Updated by okurz about 3 years ago

Target version set to Ready

Please be aware about #102882 . As long as we have that degraded network performance related to most of our PPC workers we can run less tests in parallel and wait longer, e.g. scale timeouts. Just today I fixed the problem that instances that should have stayed disabled reappeared. Likely mkittler only stopped instances but after a reboot instances came back. So consider any test results on PPC machines from the last day affected by that. I added the ticket to our backlog as we should at least scale timeouts in the host specific configs

Actions

Copy link

Updated by okurz about 3 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by szarate about 3 years ago

Related to coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache service added

Actions

Copy link

Updated by livdywan about 3 years ago

Subject changed from [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing to [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S
Description updated (diff)
Status changed from New to Workable
Assignee set to mkittler

Actions

Copy link

Updated by mkittler about 3 years ago

Status changed from Workable to Feedback

SR to workaround the slow network performance: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/375

Actions

Copy link

Updated by okurz almost 3 years ago

Status changed from Feedback to Resolved

MR was merged and the change to the worker config is effective. In the scenario from the original description the last job within latest was from four days ago so before the worker config changes. However this should suffice for the ticket at hand. There is still #102882 about the real issue. Anyone feel free to use auto-review with a regex in #102882 to automatically catch and retrigger such failures until a proper solution could be found for the root cause of degraded network performance

Actions

Copy link

#10

Updated by openqa_review almost 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: create_hdd_gnome@ppc64le-2g
https://openqa.suse.de/tests/8158017

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released" or "EOL" (End-of-Life)
The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #104106

[qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S

Observation¶

Test suite description¶

Reproducible¶

Expected result¶

Further details¶

Suggestions¶

Updated by maritawerner about 3 years ago

Updated by szarate about 3 years ago

Updated by szarate about 3 years ago

Updated by okurz about 3 years ago

Updated by okurz about 3 years ago

Updated by szarate about 3 years ago

Updated by livdywan about 3 years ago

Updated by mkittler about 3 years ago

Updated by okurz almost 3 years ago

Updated by openqa_review almost 3 years ago