action #25864

[tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"

Added by asmorodskyi over 2 years ago. Updated about 1 month ago.

Status:ResolvedStart date:09/10/2017
Priority:NormalDue date:
Assignee:okurz% Done:

0%

Category:Bugs in existing tests
Target version:openQA Project - Current Sprint
Difficulty:
Duration:

Description

Observation

openQA test in scenario sle-15-Installer-DVD-aarch64-toolchain_zypper@aarch64 fails in
scc_registration

Reproducible

Fails since (at least) Build 288.8

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Infrastructure - action #41882: all arm worker die after some time In Progress 02/10/2018 30/06/2020
Related to openQA Tests - action #60833: [sle][functional][u] performance issue of aarch64 worker:... New 10/12/2019
Blocks openQA Project - action #14972: [tools][epic] Improvements on backend to improve better h... New 24/11/2016

History

#1 Updated by asmorodskyi over 2 years ago

  • Subject changed from stall detected in openqaworker-arm-3 to [tools] stall detected in openqaworker-arm-3

#2 Updated by asmorodskyi over 2 years ago

  • Blocks action #14972: [tools][epic] Improvements on backend to improve better handling of stalls added

#3 Updated by okurz over 1 year ago

  • Subject changed from [tools] stall detected in openqaworker-arm-3 to [tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"
  • Target version set to future

https://openqa.suse.de/tests/2212567 is a job on openqaworker-arm-1 failing with a stall. http://openqa-monitoring.qa.suse.de:3000/d/Z7IkWDKmk/openqaworker-arm-1?orgId=1 reports that (only) 20 worker instances are running which should be ok. openqaworker-arm-2 I think had 30 instances enabled and was failing more often, foursixnine investigated.

#4 Updated by okurz over 1 year ago

#5 Updated by okurz over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262

#6 Updated by okurz over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262

#7 Updated by okurz over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#8 Updated by okurz over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#9 Updated by okurz about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#10 Updated by okurz about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494

#11 Updated by okurz 3 months ago

  • Related to action #41882: all arm worker die after some time added

#12 Updated by okurz 3 months ago

  • Status changed from New to Resolved
  • Assignee set to okurz
  • Target version changed from future to Done

The performance seems to have stabilized meanwhile, maybe when I upgraded the machines from SLE12SP3 to Leap 15.1? There is still a ticket about performance problems but related to the "user_settings" test module of the YaST GUI installer.

#13 Updated by zluo 2 months ago

https://openqa.suse.de/tests/3803671#step/first_boot/7 shows that we still have same issue with openqaworker-arm-2:18

#14 Updated by zluo 2 months ago

  • Status changed from Resolved to Workable

https://openqa.suse.de/tests/3817918#step/first_boot/5 shows Stall detection for openqaworker-arm-1:18

#15 Updated by okurz 2 months ago

  • Status changed from Workable to In Progress
  • Target version changed from Done to Current Sprint

I agree. That shows "worker performance issues" which is particularly true for openqaworker-arm-1. #41882 is the generic ticket for that. I will keep the ticket and see what I can do. The latest change I proposed is to reduce the number of worker instances https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/217 , merged. I have yet to mask the superfluous worker instances.

EDIT: 2020-01-22 19:53 CET: done, four worker instances active now, not more.

#16 Updated by okurz 2 months ago

  • Due date set to 19/02/2020
  • Status changed from In Progress to Feedback

#17 Updated by okurz about 1 month ago

  • Related to action #60833: [sle][functional][u] performance issue of aarch64 worker: Stall detected added

#18 Updated by okurz about 1 month ago

  • Due date deleted (19/02/2020)
  • Status changed from Feedback to Blocked

slindomansilla has informed me that according to zluo's gut feeling the reduced worker instance number might help but "issues still happen". He will run an experiment with higher RAM for the VMs in #60833 so I will set this ticket to be blocked on that one.

#19 Updated by okurz about 1 month ago

  • Status changed from Blocked to Resolved

I think #41882 is enough to track this. Seems we have improved nevertheless.

Also available in: Atom PDF