action #60833
closed[qe-core][sle][functional] performance issue of aarch64 worker: Stall detected
0%
Description
see related issue reported in https://progress.opensuse.org/issues/56087
We have statics that shows clearly performance issue on openqaworker-arm-1 and openqaworker-arm-2:
We should reduce the amount of workers on these two machines.
Updated by zluo about 5 years ago
- Related to action #46190: [functional][u] test fails in user_settings - mistyping in Username (lowercase instead of uppercase) added
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: extra_tests_on_gnome
https://openqa.suse.de/tests/3731488
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: gnome+proxy_SCC+allmodules
https://openqa.suse.de/tests/3758637
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by SLindoMansilla almost 5 years ago
Workers are again using the old number of workers which is known to produce typing issues: https://gitlab.suse.de/openqa/salt-pillars-openqa/blob/master/openqa/workerconf.sls#L489
This commit https://gitlab.suse.de/openqa/salt-pillars-openqa/commit/cef2ca2755860394d0ace4178ef51cc800dc34fe suggest to mask services and tools team agreed that while investigating masking should be used.
Once the right amount of workers is known, this should be change in the salt state https://gitlab.suse.de/openqa/salt-pillars-openqa/blob/master/openqa/workerconf.sls#L489
Updated by SLindoMansilla almost 5 years ago
- Assignee set to SLindoMansilla
- Priority changed from Normal to High
Performing binary search for openqaworker-arm-1.
Starting workers: 20
Trying with: 10 (workers from 11 to 20 stopped and masked)
Updated by okurz almost 5 years ago
I learned that masked servers make salt state apply fail so I reverted my masking and updated salt pillars accordingly. If you want to experiment with "less workers" I suggest to pin test jobs to openqaworker-arm-3 which is reduced to 4 worker instances in parallel for now. We can run the experiment but we will need to unmask worker instances again as soon as we have problems with salt recipe application.
Updated by SLindoMansilla almost 5 years ago
- Assignee deleted (
SLindoMansilla)
Approach is not accepted by tools team.
To decide in next refinement meeting.
Updated by okurz almost 5 years ago
The approach is accepted when you use salt pillar changes and not simply masking systemd services to not break salt.
Updated by zluo almost 5 years ago
#25864 is actually old ticket which has been worked by okurz
Updated by SLindoMansilla almost 5 years ago
- Assignee set to mgriessmeier
- Increase QEMURAM for openqaworker-arm-1 and openqaworker-arm-3.
- Ask Santi about requirements for an ARM server for test environment (openqa.suse.de).
- Show requirements to Ralf and see if it is possible to acquire such hardware.
Updated by okurz almost 5 years ago
SLindoMansilla wrote:
- Increase QEMURAM for openqaworker-arm-1 and openqaworker-arm-3.
Yes. This would also go in line with #46190#note-88
- Ask Santi about requirements for an ARM server for test environment (openqa.suse.de).
- Show requirements to Ralf and see if it is possible to acquire such hardware.
New ARM hardware as already requested, see https://trello.com/c/JQtnALhz/6-openqa-hw-budget-planning#comment-5e185a3e9a5c3786c32fd089
Updated by okurz almost 5 years ago
- Related to action #25864: [tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues" added
Updated by SLindoMansilla almost 5 years ago
Updated by SLindoMansilla over 4 years ago
- Status changed from New to Blocked
- Assignee changed from mgriessmeier to szarate
Updated by SLindoMansilla over 4 years ago
- Blocked by action #41882: all arm worker die after some time added
Updated by okurz over 4 years ago
- Status changed from Blocked to Workable
@SLindoMansilla #41882 is about machines crashing completely, not about performance issues per se. Please do not use that as blocker. If there is something specific I could help you with I am happy to help.
Updated by tjyrinki_suse about 4 years ago
- Subject changed from [sle][functional][u] performance issue of aarch64 worker: Stall detected to [qe-core][sle][functional] performance issue of aarch64 worker: Stall detected
Updated by szarate almost 4 years ago
- Category changed from Bugs in existing tests to Infrastructure
- Assignee deleted (
szarate)
I will not be taking at stalls for now...
Updated by szarate almost 4 years ago
- Status changed from Workable to Rejected
- Assignee set to szarate
I don't see it referenced anymore, and stalls + aarch64 is usually a bad combination on Caviums