Actions
action #115079
openopenQA Project (public) - coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA
[qe-core][qem&functional] Many test failures due to low performance on arm workers
Status:
New
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2022-08-08
Due date:
% Done:
0%
Estimated time:
Difficulty:
Tags:
Description
Description¶
In past few weeks, I have hit many sporadic (but very frequent) issues which we have to restart them several time to make them pass.
Most of these issues can only be seen on aarch64 platform, and work fine on x86_64 and s390x based on openQA's test results.
What issue we have seen¶
- Tests reach the max job time limit, that means the tests need more time than x86_64 and s390x
- "Send_key" operation doesn't work fine or can't get response even with some retry logic there
- "script_run" command needs more time to get return code on aarch64 platform, especially scrap logs within serial console
- For some installation tests, "QEMURAM=1024" fail can be seen very often, but no such issue with x86_64 and s390x
Current workarounds/fixes¶
- Increase the resource for each job [used to increase memory size]
- Increase timeout value for the scripts
- Remove some test modules which don't impact the test function but often fail with perf issues
- Add some re-try logic for the commands within the test scripts
Expected results¶
I don't know if low performance issue is expected on aarch64 platform. however, we may have to handle these failures during the daily openQA review.
My personal suggestions are:
- Order new arm workers with higher performance [e.g. new CPU modules/high speed storage]
- Check with kernel team/performance team to see if we can have some fixes/patches to fix the performance issue on aarch64 platform.
Actions