action #115079
openopenQA Project (public) - coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA
[qe-core][qem&functional] Many test failures due to low performance on arm workers
0%
Description
Description¶
In past few weeks, I have hit many sporadic (but very frequent) issues which we have to restart them several time to make them pass.
Most of these issues can only be seen on aarch64 platform, and work fine on x86_64 and s390x based on openQA's test results.
What issue we have seen¶
- Tests reach the max job time limit, that means the tests need more time than x86_64 and s390x
- "Send_key" operation doesn't work fine or can't get response even with some retry logic there
- "script_run" command needs more time to get return code on aarch64 platform, especially scrap logs within serial console
- For some installation tests, "QEMURAM=1024" fail can be seen very often, but no such issue with x86_64 and s390x
Current workarounds/fixes¶
- Increase the resource for each job [used to increase memory size]
- Increase timeout value for the scripts
- Remove some test modules which don't impact the test function but often fail with perf issues
- Add some re-try logic for the commands within the test scripts
Expected results¶
I don't know if low performance issue is expected on aarch64 platform. however, we may have to handle these failures during the daily openQA review.
My personal suggestions are:
- Order new arm workers with higher performance [e.g. new CPU modules/high speed storage]
- Check with kernel team/performance team to see if we can have some fixes/patches to fix the performance issue on aarch64 platform.
Updated by rfan1 over 2 years ago
- Related to action #114959: [qem][qe-core]test fails in logs_from_installation_system, "wait_countdown_stop" function can't stop auto reboot process added
Updated by rfan1 over 2 years ago
- Related to action #114688: [qe-core][qem] test fails in hostname_inst added
Updated by rfan1 over 2 years ago
- Related to action #114854: [qem][qe-core][aarch64]test fails in yast2_nfs_server, took to long time to get return in serial terminal with ( journalctl -fu nfs-server -o short-precise > /dev/ttyAMA0 & ) added
Updated by rfan1 over 2 years ago
- Related to action #114956: [qem][qe-core][aarch64]qam-minimal+base,test execution exceeded MAX_JOB_TIME added
Updated by rfan1 over 2 years ago
- Related to action #113396: [qe-core]test fails in logs_from_installation_system due to 'wait_countdown_stop' function doesn't work fine, performance issue? added
Updated by rfan1 over 2 years ago
- Subject changed from [qem][qe-core] Many test failures due to low performance on arm workers to [qe-core][qem&functional] Many test failures due to low performance on arm workers
Updated by rfan1 over 2 years ago
- Related to action #115886: [qe-core][sle15sp5][functional][aarch64]test fails in pkcon, timeout with cmd "pkcon install coreutils --allow-reinstall --allow-downgrade -y" added
Updated by rfan1 about 2 years ago
- Related to action #119134: [qem][qe-core]test fails in clone, yast clone_system seems hang added
Updated by rfan1 about 2 years ago
- Related to action #119524: [qe-core]test fails in await_install added
Updated by szarate about 1 year ago
- Tags changed from bugbusters to platform-team
- Parent task set to #102906
Updated by slo-gin about 1 month ago
This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.