Project

General

Profile

action #25864

[tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"

Added by asmorodskyi almost 4 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Start date:
2017-10-09
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-Installer-DVD-aarch64-toolchain_zypper@aarch64 fails in
scc_registration

Reproducible

Fails since (at least) Build 288.8

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Infrastructure - action #41882: all arm worker die after some timeResolved2018-10-02

Related to openQA Tests - action #60833: [qe-core][sle][functional] performance issue of aarch64 worker: Stall detectedRejected2019-12-10

Blocks openQA Project - coordination #14972: [tools][epic] Improvements on backend to improve better handling of stallsResolved2016-11-24

History

#1 Updated by asmorodskyi almost 4 years ago

  • Subject changed from stall detected in openqaworker-arm-3 to [tools] stall detected in openqaworker-arm-3

#2 Updated by asmorodskyi almost 4 years ago

  • Blocks coordination #14972: [tools][epic] Improvements on backend to improve better handling of stalls added

#3 Updated by okurz almost 3 years ago

  • Subject changed from [tools] stall detected in openqaworker-arm-3 to [tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"
  • Target version set to future

https://openqa.suse.de/tests/2212567 is a job on openqaworker-arm-1 failing with a stall. http://openqa-monitoring.qa.suse.de:3000/d/Z7IkWDKmk/openqaworker-arm-1?orgId=1 reports that (only) 20 worker instances are running which should be ok. openqaworker-arm-2 I think had 30 instances enabled and was failing more often, foursixnine investigated.

#4 Updated by okurz almost 3 years ago

#5 Updated by okurz almost 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262

#6 Updated by okurz almost 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262

#7 Updated by okurz almost 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#8 Updated by okurz almost 3 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#9 Updated by okurz over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#10 Updated by okurz over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#11 Updated by okurz over 1 year ago

  • Related to action #41882: all arm worker die after some time added

#12 Updated by okurz over 1 year ago

  • Status changed from New to Resolved
  • Assignee set to okurz
  • Target version changed from future to Done

The performance seems to have stabilized meanwhile, maybe when I upgraded the machines from SLE12SP3 to Leap 15.1? There is still a ticket about performance problems but related to the "user_settings" test module of the YaST GUI installer.

#13 Updated by zluo over 1 year ago

https://openqa.suse.de/tests/3803671#step/first_boot/7 shows that we still have same issue with openqaworker-arm-2:18

#14 Updated by zluo over 1 year ago

  • Status changed from Resolved to Workable

https://openqa.suse.de/tests/3817918#step/first_boot/5 shows Stall detection for openqaworker-arm-1:18

#15 Updated by okurz over 1 year ago

  • Status changed from Workable to In Progress
  • Target version changed from Done to Current Sprint

I agree. That shows "worker performance issues" which is particularly true for openqaworker-arm-1. #41882 is the generic ticket for that. I will keep the ticket and see what I can do. The latest change I proposed is to reduce the number of worker instances https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/217 , merged. I have yet to mask the superfluous worker instances.

EDIT: 2020-01-22 19:53 CET: done, four worker instances active now, not more.

#16 Updated by okurz over 1 year ago

  • Due date set to 2020-02-19
  • Status changed from In Progress to Feedback

#17 Updated by okurz over 1 year ago

  • Related to action #60833: [qe-core][sle][functional] performance issue of aarch64 worker: Stall detected added

#18 Updated by okurz over 1 year ago

  • Due date deleted (2020-02-19)
  • Status changed from Feedback to Blocked

slindomansilla has informed me that according to zluo's gut feeling the reduced worker instance number might help but "issues still happen". He will run an experiment with higher RAM for the VMs in #60833 so I will set this ticket to be blocked on that one.

#19 Updated by okurz over 1 year ago

  • Status changed from Blocked to Resolved

I think #41882 is enough to track this. Seems we have improved nevertheless.

Also available in: Atom PDF