action #135329
closedopenQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
openQA Project (public) - coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert
s390x work demand exceeds available workers
0%
Description
We're running into load issues with our s390x test runs and are falling back on our product delivery.
e.g. https://openqa.suse.de/tests/12027610 blocks BCI container releases and is in the scheduling queue for 18 hours. However those updates are expected to leave QA within hours.
We kindly ask for a solution for this problem in a timely matter. We are obliged to deliver certain container updates within 24h and not fulfilling this requirement can have severe impact on the some of our BCI contracts.
This is urgent.
Updated by ph03nix over 1 year ago
I will decrease the priority for the test runs in question as a quickfix, but I think we really need more workers in the long run.
Updated by okurz over 1 year ago
- Target version set to Ready
If you can help us to raise the concern with SUSE-IT Eng-Infra and assign more CPU+RAM to the OSD VM we can increase the amount of workers
Updated by ph03nix over 1 year ago
- Related to action #135332: Ensure recent containers are released added
Updated by okurz over 1 year ago
- Status changed from New to Blocked
- Assignee set to okurz
Felix filed https://sd.suse.com/servicedesk/customer/portal/1/SD-131786 for it. I shared with "OSD Admins". All others should just track this ticket #135329
Updated by ph03nix over 1 year ago
I filed https://sd.suse.com/servicedesk/customer/portal/1/SD-131786 for it. Anyone who needs access, just ping me in Slack.
Updated by ph03nix over 1 year ago
Most urgency is resolved for us now. Thanks for looking into this!
From my POV this ticket can be closed, unless you peeps need to have it open for further work.
Updated by okurz over 1 year ago
Thanks for your explicit response. As long as the SD ticket is open at least I would like to keep the ticket open. But, is the original issue regarding s390x jobs then really resolved? If yes, what would you say was the impact of you manually tweaking the jobs scheduling priorities?
Updated by ph03nix over 1 year ago
okurz wrote in #note-7:
Thanks for your explicit response. As long as the SD ticket is open at least I would like to keep the ticket open. But, is the original issue regarding s390x jobs then really resolved? If yes, what would you say was the impact of you manually tweaking the jobs scheduling priorities?
I'm not observing s390x blocking any ongoing issues at the moment, however we only notice this when things are already on fire.
So, the urgency of the task is gone, but I could not say with confidence that the load issue with s390x is resolved. I do see however ppc64le taking longer than other architectures.
Updated by okurz over 1 year ago
- Status changed from Blocked to Resolved
Ok, thx. https://sd.suse.com/servicedesk/customer/portal/1/SD-131786 was resolved, the OSD VM has more CPU and more RAM. In a related ticket I commented that we removed the job limit again for now so we can follow up there and resolve here
Updated by okurz about 1 year ago
- Related to action #127523: [qe-core][s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources added