action #135329
closedopenQA Project - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
openQA Project - coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert
s390x work demand exceeds available workers
0%
Description
We're running into load issues with our s390x test runs and are falling back on our product delivery.
e.g. https://openqa.suse.de/tests/12027610 blocks BCI container releases and is in the scheduling queue for 18 hours. However those updates are expected to leave QA within hours.
We kindly ask for a solution for this problem in a timely matter. We are obliged to deliver certain container updates within 24h and not fulfilling this requirement can have severe impact on the some of our BCI contracts.
This is urgent.
Updated by ph03nix 3 months ago
I filed https://sd.suse.com/servicedesk/customer/portal/1/SD-131786 for it. Anyone who needs access, just ping me in Slack.
Updated by okurz 3 months ago
Thanks for your explicit response. As long as the SD ticket is open at least I would like to keep the ticket open. But, is the original issue regarding s390x jobs then really resolved? If yes, what would you say was the impact of you manually tweaking the jobs scheduling priorities?
Updated by ph03nix 3 months ago
okurz wrote in #note-7:
Thanks for your explicit response. As long as the SD ticket is open at least I would like to keep the ticket open. But, is the original issue regarding s390x jobs then really resolved? If yes, what would you say was the impact of you manually tweaking the jobs scheduling priorities?
I'm not observing s390x blocking any ongoing issues at the moment, however we only notice this when things are already on fire.
So, the urgency of the task is gone, but I could not say with confidence that the load issue with s390x is resolved. I do see however ppc64le taking longer than other architectures.
Updated by okurz 3 months ago
- Status changed from Blocked to Resolved
Ok, thx. https://sd.suse.com/servicedesk/customer/portal/1/SD-131786 was resolved, the OSD VM has more CPU and more RAM. In a related ticket I commented that we removed the job limit again for now so we can follow up there and resolve here
Updated by okurz 3 months ago
- Related to action #127523: [qe-core][s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources added