[tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"
|Category:||Bugs in existing tests|
|Target version:||openQA Project - Current Sprint|
#3 Updated by okurz over 1 year ago
- Subject changed from [tools] stall detected in openqaworker-arm-3 to [tools][functional][u] stall detected in openqaworker-arm-1 through 3 sometimes - "worker performance issues"
- Target version set to future
https://openqa.suse.de/tests/2212567 is a job on openqaworker-arm-1 failing with a stall. http://openqa-monitoring.qa.suse.de:3000/d/Z7IkWDKmk/openqaworker-arm-1?orgId=1 reports that (only) 20 worker instances are running which should be ok. openqaworker-arm-2 I think had 30 instances enabled and was failing more often, foursixnine investigated.
#4 Updated by okurz over 1 year ago
https://openqa.suse.de/tests/2212507#step/glxgears/7 a failure in openqaworker-arm-2
- Status changed from New to Resolved
- Assignee set to okurz
- Target version changed from future to Done
The performance seems to have stabilized meanwhile, maybe when I upgraded the machines from SLE12SP3 to Leap 15.1? There is still a ticket about performance problems but related to the "user_settings" test module of the YaST GUI installer.
https://openqa.suse.de/tests/3803671#step/first_boot/7 shows that we still have same issue with openqaworker-arm-2:18
- Status changed from Resolved to Workable
https://openqa.suse.de/tests/3817918#step/first_boot/5 shows Stall detection for openqaworker-arm-1:18
- Status changed from Workable to In Progress
- Target version changed from Done to Current Sprint
I agree. That shows "worker performance issues" which is particularly true for openqaworker-arm-1. #41882 is the generic ticket for that. I will keep the ticket and see what I can do. The latest change I proposed is to reduce the number of worker instances https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/217 , merged. I have yet to mask the superfluous worker instances.
EDIT: 2020-01-22 19:53 CET: done, four worker instances active now, not more.
#18 Updated by okurz about 1 month ago
- Due date deleted (
- Status changed from Feedback to Blocked
slindomansilla has informed me that according to zluo's gut feeling the reduced worker instance number might help but "issues still happen". He will run an experiment with higher RAM for the VMs in #60833 so I will set this ticket to be blocked on that one.