Project

General

Profile

Actions

action #115079

open

openQA Project (public) - coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

[qe-core][qem&functional] Many test failures due to low performance on arm workers

Added by rfan1 over 2 years ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2022-08-08
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Description

In past few weeks, I have hit many sporadic (but very frequent) issues which we have to restart them several time to make them pass.
Most of these issues can only be seen on aarch64 platform, and work fine on x86_64 and s390x based on openQA's test results.

What issue we have seen

  • Tests reach the max job time limit, that means the tests need more time than x86_64 and s390x
  • "Send_key" operation doesn't work fine or can't get response even with some retry logic there
  • "script_run" command needs more time to get return code on aarch64 platform, especially scrap logs within serial console
  • For some installation tests, "QEMURAM=1024" fail can be seen very often, but no such issue with x86_64 and s390x

Current workarounds/fixes

  • Increase the resource for each job [used to increase memory size]
  • Increase timeout value for the scripts
  • Remove some test modules which don't impact the test function but often fail with perf issues
  • Add some re-try logic for the commands within the test scripts

Expected results

I don't know if low performance issue is expected on aarch64 platform. however, we may have to handle these failures during the daily openQA review.
My personal suggestions are:

  1. Order new arm workers with higher performance [e.g. new CPU modules/high speed storage]
  2. Check with kernel team/performance team to see if we can have some fixes/patches to fix the performance issue on aarch64 platform.

Related issues 8 (0 open8 closed)

Related to openQA Tests (public) - action #114959: [qem][qe-core]test fails in logs_from_installation_system, "wait_countdown_stop" function can't stop auto reboot processResolvedrfan12022-08-04

Actions
Related to openQA Tests (public) - action #114688: [qe-core][qem] test fails in hostname_instResolvedrfan12022-07-26

Actions
Related to openQA Tests (public) - action #114854: [qem][qe-core][aarch64]test fails in yast2_nfs_server, took to long time to get return in serial terminal with ( journalctl -fu nfs-server -o short-precise > /dev/ttyAMA0 & )Resolvedrfan12022-08-01

Actions
Related to openQA Tests (public) - action #114956: [qem][qe-core][aarch64]qam-minimal+base,test execution exceeded MAX_JOB_TIMEResolvedrfan12022-08-04

Actions
Related to openQA Tests (public) - action #113396: [qe-core]test fails in logs_from_installation_system due to 'wait_countdown_stop' function doesn't work fine, performance issue?Resolvedrfan12022-07-08

Actions
Related to openQA Tests (public) - action #115886: [qe-core][sle15sp5][functional][aarch64]test fails in pkcon, timeout with cmd "pkcon install coreutils --allow-reinstall --allow-downgrade -y"Resolveddvenkatachala2022-08-29

Actions
Related to openQA Tests (public) - action #119134: [qem][qe-core]test fails in clone, yast clone_system seems hangResolvedzluo2022-10-21

Actions
Related to openQA Tests (public) - action #119524: [qe-core]test fails in await_installResolvedrfan1

Actions
Actions #1

Updated by rfan1 over 2 years ago

  • Related to action #114959: [qem][qe-core]test fails in logs_from_installation_system, "wait_countdown_stop" function can't stop auto reboot process added
Actions #2

Updated by rfan1 over 2 years ago

  • Related to action #114688: [qe-core][qem] test fails in hostname_inst added
Actions #3

Updated by rfan1 over 2 years ago

  • Related to action #114854: [qem][qe-core][aarch64]test fails in yast2_nfs_server, took to long time to get return in serial terminal with ( journalctl -fu nfs-server -o short-precise > /dev/ttyAMA0 & ) added
Actions #4

Updated by rfan1 over 2 years ago

  • Related to action #114956: [qem][qe-core][aarch64]qam-minimal+base,test execution exceeded MAX_JOB_TIME added
Actions #5

Updated by rfan1 over 2 years ago

  • Related to action #113396: [qe-core]test fails in logs_from_installation_system due to 'wait_countdown_stop' function doesn't work fine, performance issue? added
Actions #6

Updated by rfan1 over 2 years ago

  • Subject changed from [qem][qe-core] Many test failures due to low performance on arm workers to [qe-core][qem&functional] Many test failures due to low performance on arm workers
Actions #7

Updated by rfan1 over 2 years ago

  • Related to action #115886: [qe-core][sle15sp5][functional][aarch64]test fails in pkcon, timeout with cmd "pkcon install coreutils --allow-reinstall --allow-downgrade -y" added
Actions #8

Updated by rfan1 about 2 years ago

  • Related to action #119134: [qem][qe-core]test fails in clone, yast clone_system seems hang added
Actions #9

Updated by rfan1 about 2 years ago

  • Related to action #119524: [qe-core]test fails in await_install added
Actions #10

Updated by szarate about 1 year ago

  • Tags changed from bugbusters to platform-team
  • Parent task set to #102906
Actions #11

Updated by slo-gin about 1 month ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions

Also available in: Atom PDF