Project

General

Profile

Actions

action #25864

closed

[tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"

Added by asmorodskyi about 7 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Start date:
2017-10-09
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-Installer-DVD-aarch64-toolchain_zypper@aarch64 fails in
scc_registration

Reproducible

Fails since (at least) Build 288.8

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 3 (0 open3 closed)

Related to openQA Infrastructure (public) - action #41882: all arm worker die after some timeResolvedokurz2018-10-02

Actions
Related to openQA Tests (public) - action #60833: [qe-core][sle][functional] performance issue of aarch64 worker: Stall detectedRejectedszarate2019-12-10

Actions
Blocks openQA Project (public) - coordination #14972: [tools][epic] Improvements on backend to improve better handling of stallsResolvedokurz2016-11-24

Actions
Actions #1

Updated by asmorodskyi about 7 years ago

  • Subject changed from stall detected in openqaworker-arm-3 to [tools] stall detected in openqaworker-arm-3
Actions #2

Updated by asmorodskyi about 7 years ago

  • Blocks coordination #14972: [tools][epic] Improvements on backend to improve better handling of stalls added
Actions #3

Updated by okurz about 6 years ago

  • Subject changed from [tools] stall detected in openqaworker-arm-3 to [tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"
  • Target version set to future

https://openqa.suse.de/tests/2212567 is a job on openqaworker-arm-1 failing with a stall. http://openqa-monitoring.qa.suse.de:3000/d/Z7IkWDKmk/openqaworker-arm-1?orgId=1 reports that (only) 20 worker instances are running which should be ok. openqaworker-arm-2 I think had 30 instances enabled and was failing more often, foursixnine investigated.

Actions #4

Updated by okurz about 6 years ago

Actions #5

Updated by okurz about 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262

Actions #6

Updated by okurz about 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262

Actions #7

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

Actions #8

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

Actions #9

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

Actions #10

Updated by okurz almost 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

Actions #11

Updated by okurz almost 5 years ago

  • Related to action #41882: all arm worker die after some time added
Actions #12

Updated by okurz almost 5 years ago

  • Status changed from New to Resolved
  • Assignee set to okurz
  • Target version changed from future to Done

The performance seems to have stabilized meanwhile, maybe when I upgraded the machines from SLE12SP3 to Leap 15.1? There is still a ticket about performance problems but related to the "user_settings" test module of the YaST GUI installer.

Actions #13

Updated by zluo almost 5 years ago

https://openqa.suse.de/tests/3803671#step/first_boot/7 shows that we still have same issue with openqaworker-arm-2:18

Actions #14

Updated by zluo almost 5 years ago

  • Status changed from Resolved to Workable

https://openqa.suse.de/tests/3817918#step/first_boot/5 shows Stall detection for openqaworker-arm-1:18

Actions #15

Updated by okurz almost 5 years ago

  • Status changed from Workable to In Progress
  • Target version changed from Done to Current Sprint

I agree. That shows "worker performance issues" which is particularly true for openqaworker-arm-1. #41882 is the generic ticket for that. I will keep the ticket and see what I can do. The latest change I proposed is to reduce the number of worker instances https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/217 , merged. I have yet to mask the superfluous worker instances.

EDIT: 2020-01-22 19:53 CET: done, four worker instances active now, not more.

Actions #16

Updated by okurz almost 5 years ago

  • Due date set to 2020-02-19
  • Status changed from In Progress to Feedback
Actions #17

Updated by okurz almost 5 years ago

  • Related to action #60833: [qe-core][sle][functional] performance issue of aarch64 worker: Stall detected added
Actions #18

Updated by okurz almost 5 years ago

  • Due date deleted (2020-02-19)
  • Status changed from Feedback to Blocked

slindomansilla has informed me that according to zluo's gut feeling the reduced worker instance number might help but "issues still happen". He will run an experiment with higher RAM for the VMs in #60833 so I will set this ticket to be blocked on that one.

Actions #19

Updated by okurz almost 5 years ago

  • Status changed from Blocked to Resolved

I think #41882 is enough to track this. Seems we have improved nevertheless.

Actions

Also available in: Atom PDF