action #104106
closed
[qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S
Added by szarate almost 3 years ago.
Updated over 2 years ago.
Description
Observation¶
Long story short since few builds, await_install is failing sporadically, but does work after retriggering the test... looking a bit closer, the test seems to be getting on the 30ish minute threshold of the timeout, and going strong, but shy of few minutes to finish...
openQA test in scenario sle-15-SP4-Online-ppc64le-create_hdd_gnome@ppc64le-2g fails in
await_install
Test suite description¶
image creation job used as parent for other jobs testing based on existing installation. To be used as START_AFTER_TEST=create_hdd_gnome
Reproducible¶
Fails since (at least) Build 74.1 (current job)
Expected result¶
Last good: 70.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Suggestions¶
- Increase TIMEOUT_SCALE, and also MAX_SETUP_TIME
@szarate a ticket for the qe tools team? for the power backend?
- Project changed from openQA Tests to openQA Infrastructure
- Subject changed from test fails in await_install - Network peformace for ppc installations is decreasing to [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing
- Description updated (diff)
- Category deleted (
Bugs in existing tests)
@maritawerner wrote:
@szarate a ticket for the qe tools team? for the power backend?
Whoever has to take care of the infraestructure and network... but for now I'm leaving it within qe-core, but in the infraestructure project, as this has little to do with tests but rather with hardware per se.
- Target version set to Ready
Please be aware about #102882 . As long as we have that degraded network performance related to most of our PPC workers we can run less tests in parallel and wait longer, e.g. scale timeouts. Just today I fixed the problem that instances that should have stayed disabled reappeared. Likely mkittler only stopped instances but after a reboot instances came back. So consider any test results on PPC machines from the last day affected by that. I added the ticket to our backlog as we should at least scale timeouts in the host specific configs
- Priority changed from Normal to Urgent
- Related to coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache service added
- Subject changed from [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing to [qe-core] test fails in await_install - Network peformace for ppc installations is decreasing size:S
- Description updated (diff)
- Status changed from New to Workable
- Assignee set to mkittler
- Status changed from Workable to Feedback
- Status changed from Feedback to Resolved
MR was merged and the change to the worker config is effective. In the scenario from the original description the last job within latest was from four days ago so before the worker config changes. However this should suffice for the ticket at hand. There is still #102882 about the real issue. Anyone feel free to use auto-review with a regex in #102882 to automatically catch and retrigger such failures until a proper solution could be found for the root cause of degraded network performance
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: create_hdd_gnome@ppc64le-2g
https://openqa.suse.de/tests/8158017
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Also available in: Atom
PDF