Actions
action #139103
openopenQA Project - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
openQA Project - coordination #139010: [epic] Long OSD ppc64le job queue
Long OSD ppc64le job queue - Decrease number of x86_64 worker slots on osd to give ppc64le jobs a better chance to be assigned jobs size:M
Start date:
2023-11-04
Due date:
% Done:
0%
Estimated time:
Description
Motivation¶
Currently on OSD there is a longer job queue in particular for ppc64le for multiple reasons, see #139010. One idea to decrease number of x86_64 worker slots on osd to give ppc64le jobs a better chance to be assigned jobs due to the OSD openQA instance job limit.
Acceptance criteria¶
- AC1: The impact of worker instance ratio by arch/class has been verified
- AC2: Given the openQA instance job limit is impacting the ppc64le job queue When the ratio of ppc64le/all workers has been increased Then the ppc64le job age is lower
Suggestions¶
- DONE Look up current number of x86_64 and qemu ppc64le jobs assuming that we have a very low ppc64le/all ratio, e.g. many workers for qemu_x86_64 and very few for qemu_ppc64le (16 as of 2023-11-04).
- DONE Reduce number of x86_64 qemu slots if we have "too many"
- Monitor for the impact on qemu_ppc64le job age
- Increase the amount of ppc64le machines and then again re-enable x86_64 machines
Rollback steps¶
- Re-enable openQA OSD workers w35-w36, remove according alert https://monitor.qa.suse.de/alerting/silence/e2c36842-e6a9-4d48-aeef-330c3d8604c7/edit?alertmanager=grafana
- Revert https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/687 to enable multi-machine tests after ensuring stability
Out of scope¶
- Any code changes for the scheduler
Actions