action #104106
closed[qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S
0%
Description
Observation¶
Long story short since few builds, await_install is failing sporadically, but does work after retriggering the test... looking a bit closer, the test seems to be getting on the 30ish minute threshold of the timeout, and going strong, but shy of few minutes to finish...
openQA test in scenario sle-15-SP4-Online-ppc64le-create_hdd_gnome@ppc64le-2g fails in
await_install
Test suite description¶
image creation job used as parent for other jobs testing based on existing installation. To be used as START_AFTER_TEST=create_hdd_gnome
Reproducible¶
Fails since (at least) Build 74.1 (current job)
Expected result¶
Last good: 70.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Suggestions¶
- Increase TIMEOUT_SCALE, and also MAX_SETUP_TIME
Updated by maritawerner almost 3 years ago
@szarate a ticket for the qe tools team? for the power backend?
Updated by szarate almost 3 years ago
- Project changed from openQA Tests to openQA Infrastructure
- Subject changed from test fails in await_install - Network peformace for ppc installations is decreasing to [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing
- Description updated (diff)
- Category deleted (
Bugs in existing tests)
Created https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13885 for the time being I'll keep an eye on the problem to see how does this behave in the future, but likely somebody will have to take a look to figure out where's the network bottleneck...
Adding Oliver as a watcher to see if he has any suggestions
Updated by szarate almost 3 years ago
@maritawerner wrote:
@szarate a ticket for the qe tools team? for the power backend?
Whoever has to take care of the infraestructure and network... but for now I'm leaving it within qe-core, but in the infraestructure project, as this has little to do with tests but rather with hardware per se.
Updated by okurz almost 3 years ago
- Target version set to Ready
Please be aware about #102882 . As long as we have that degraded network performance related to most of our PPC workers we can run less tests in parallel and wait longer, e.g. scale timeouts. Just today I fixed the problem that instances that should have stayed disabled reappeared. Likely mkittler only stopped instances but after a reboot instances came back. So consider any test results on PPC machines from the last day affected by that. I added the ticket to our backlog as we should at least scale timeouts in the host specific configs
Updated by szarate almost 3 years ago
- Related to coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache service added
Updated by livdywan almost 3 years ago
- Subject changed from [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing to [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S
- Description updated (diff)
- Status changed from New to Workable
- Assignee set to mkittler
Updated by mkittler almost 3 years ago
- Status changed from Workable to Feedback
SR to workaround the slow network performance: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/375
Updated by okurz almost 3 years ago
- Status changed from Feedback to Resolved
MR was merged and the change to the worker config is effective. In the scenario from the original description the last job within latest was from four days ago so before the worker config changes. However this should suffice for the ticket at hand. There is still #102882 about the real issue. Anyone feel free to use auto-review with a regex in #102882 to automatically catch and retrigger such failures until a proper solution could be found for the root cause of degraded network performance
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: create_hdd_gnome@ppc64le-2g
https://openqa.suse.de/tests/8158017
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234