action #115079: [qe-core][qem&functional] Many test failures due to low performance on arm workers - openQA Tests (public) - openSUSE Project Management Tool

Actions

action #115079

open

openQA Project (public) - coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

[qe-core][qem&functional] Many test failures due to low performance on arm workers

Added by rfan1 over 2 years ago. Updated about 1 month ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Bugs in existing tests

Target version:

Start date:

2022-08-08

Due date:

% Done:

Estimated time:

Difficulty:

Tags:

platform-team

Description

Description¶

In past few weeks, I have hit many sporadic (but very frequent) issues which we have to restart them several time to make them pass.
Most of these issues can only be seen on aarch64 platform, and work fine on x86_64 and s390x based on openQA's test results.

What issue we have seen¶

Tests reach the max job time limit, that means the tests need more time than x86_64 and s390x
"Send_key" operation doesn't work fine or can't get response even with some retry logic there
"script_run" command needs more time to get return code on aarch64 platform, especially scrap logs within serial console
For some installation tests, "QEMURAM=1024" fail can be seen very often, but no such issue with x86_64 and s390x

Current workarounds/fixes¶

Increase the resource for each job [used to increase memory size]
Increase timeout value for the scripts
Remove some test modules which don't impact the test function but often fail with perf issues
Add some re-try logic for the commands within the test scripts

Expected results¶

I don't know if low performance issue is expected on aarch64 platform. however, we may have to handle these failures during the daily openQA review.
My personal suggestions are:

Order new arm workers with higher performance [e.g. new CPU modules/high speed storage]
Check with kernel team/performance team to see if we can have some fixes/patches to fix the performance issue on aarch64 platform.

Related issues 8 (0 open — 8 closed)

Related to openQA Tests (public) - action #114959: [qem][qe-core]test fails in logs_from_installation_system, "wait_countdown_stop" function can't stop auto reboot process

Resolved

rfan1

2022-08-04

Actions

Related to openQA Tests (public) - action #114688: [qe-core][qem] test fails in hostname_inst

Resolved

rfan1

2022-07-26

Actions

Related to openQA Tests (public) - action #114854: [qem][qe-core][aarch64]test fails in yast2_nfs_server, took to long time to get return in serial terminal with ( journalctl -fu nfs-server -o short-precise > /dev/ttyAMA0 & )

Resolved

rfan1

2022-08-01

Actions

Related to openQA Tests (public) - action #114956: [qem][qe-core][aarch64]qam-minimal+base,test execution exceeded MAX_JOB_TIME

Resolved

rfan1

2022-08-04

Actions

Related to openQA Tests (public) - action #113396: [qe-core]test fails in logs_from_installation_system due to 'wait_countdown_stop' function doesn't work fine, performance issue?

Resolved

rfan1

2022-07-08

Actions

Related to openQA Tests (public) - action #115886: [qe-core][sle15sp5][functional][aarch64]test fails in pkcon, timeout with cmd "pkcon install coreutils --allow-reinstall --allow-downgrade -y"

Resolved

dvenkatachala

2022-08-29

Actions

Related to openQA Tests (public) - action #119134: [qem][qe-core]test fails in clone, yast clone_system seems hang

Resolved

zluo

2022-10-21

Actions

Related to openQA Tests (public) - action #119524: [qe-core]test fails in await_install

Resolved

rfan1