action #25864
closed[tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"
0%
Description
Updated by asmorodskyi about 7 years ago
- Subject changed from stall detected in openqaworker-arm-3 to [tools] stall detected in openqaworker-arm-3
Updated by asmorodskyi about 7 years ago
- Blocks coordination #14972: [tools][epic] Improvements on backend to improve better handling of stalls added
Updated by okurz about 6 years ago
- Subject changed from [tools] stall detected in openqaworker-arm-3 to [tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"
- Target version set to future
https://openqa.suse.de/tests/2212567 is a job on openqaworker-arm-1 failing with a stall. http://openqa-monitoring.qa.suse.de:3000/d/Z7IkWDKmk/openqaworker-arm-1?orgId=1 reports that (only) 20 worker instances are running which should be ok. openqaworker-arm-2 I think had 30 instances enabled and was failing more often, foursixnine investigated.
Updated by okurz about 6 years ago
https://openqa.suse.de/tests/2212507#step/glxgears/7 a failure in openqaworker-arm-2
Updated by okurz about 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262
Updated by okurz about 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: om_proxyscc_sles12sp2_allpatterns_full_update_by_yast_aarch64
https://openqa.suse.de/tests/2244262
Updated by okurz about 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494
Updated by okurz almost 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494
Updated by okurz almost 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494
Updated by okurz almost 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: migration_offline_sles12sp2+alladdons_allpatterns_fullupdate_aarch64
https://openqa.suse.de/tests/2247494
Updated by okurz almost 5 years ago
- Related to action #41882: all arm worker die after some time added
Updated by okurz almost 5 years ago
- Status changed from New to Resolved
- Assignee set to okurz
- Target version changed from future to Done
The performance seems to have stabilized meanwhile, maybe when I upgraded the machines from SLE12SP3 to Leap 15.1? There is still a ticket about performance problems but related to the "user_settings" test module of the YaST GUI installer.
Updated by zluo almost 5 years ago
https://openqa.suse.de/tests/3803671#step/first_boot/7 shows that we still have same issue with openqaworker-arm-2:18
Updated by zluo almost 5 years ago
- Status changed from Resolved to Workable
https://openqa.suse.de/tests/3817918#step/first_boot/5 shows Stall detection for openqaworker-arm-1:18
Updated by okurz almost 5 years ago
- Status changed from Workable to In Progress
- Target version changed from Done to Current Sprint
I agree. That shows "worker performance issues" which is particularly true for openqaworker-arm-1. #41882 is the generic ticket for that. I will keep the ticket and see what I can do. The latest change I proposed is to reduce the number of worker instances https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/217 , merged. I have yet to mask the superfluous worker instances.
EDIT: 2020-01-22 19:53 CET: done, four worker instances active now, not more.
Updated by okurz almost 5 years ago
- Due date set to 2020-02-19
- Status changed from In Progress to Feedback
Updated by okurz almost 5 years ago
- Related to action #60833: [qe-core][sle][functional] performance issue of aarch64 worker: Stall detected added
Updated by okurz almost 5 years ago
- Due date deleted (
2020-02-19) - Status changed from Feedback to Blocked
slindomansilla has informed me that according to zluo's gut feeling the reduced worker instance number might help but "issues still happen". He will run an experiment with higher RAM for the VMs in #60833 so I will set this ticket to be blocked on that one.
Updated by okurz almost 5 years ago
- Status changed from Blocked to Resolved
I think #41882 is enough to track this. Seems we have improved nevertheless.