action #135329
closed
openQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
openQA Project (public) - coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert
s390x work demand exceeds available workers
Added by ph03nix over 1 year ago.
Updated over 1 year ago.
Description
We're running into load issues with our s390x test runs and are falling back on our product delivery.
e.g. https://openqa.suse.de/tests/12027610 blocks BCI container releases and is in the scheduling queue for 18 hours. However those updates are expected to leave QA within hours.
We kindly ask for a solution for this problem in a timely matter. We are obliged to deliver certain container updates within 24h and not fulfilling this requirement can have severe impact on the some of our BCI contracts.
This is urgent.
I will decrease the priority for the test runs in question as a quickfix, but I think we really need more workers in the long run.
- Target version set to Ready
If you can help us to raise the concern with SUSE-IT Eng-Infra and assign more CPU+RAM to the OSD VM we can increase the amount of workers
- Related to action #135332: Ensure recent containers are released added
- Status changed from New to Blocked
- Assignee set to okurz
Most urgency is resolved for us now. Thanks for looking into this!
From my POV this ticket can be closed, unless you peeps need to have it open for further work.
Thanks for your explicit response. As long as the SD ticket is open at least I would like to keep the ticket open. But, is the original issue regarding s390x jobs then really resolved? If yes, what would you say was the impact of you manually tweaking the jobs scheduling priorities?
- Parent task set to #135122
okurz wrote in #note-7:
Thanks for your explicit response. As long as the SD ticket is open at least I would like to keep the ticket open. But, is the original issue regarding s390x jobs then really resolved? If yes, what would you say was the impact of you manually tweaking the jobs scheduling priorities?
I'm not observing s390x blocking any ongoing issues at the moment, however we only notice this when things are already on fire.
So, the urgency of the task is gone, but I could not say with confidence that the load issue with s390x is resolved. I do see however ppc64le taking longer than other architectures.
- Status changed from Blocked to Resolved
- Related to action #127523: [qe-core][s390x][kvm] Make use of generic "s390-kvm" class to prevent too long waiting for s390x worker ressources added
Also available in: Atom
PDF