openQA Project - coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
Increase number of CPU cores on OSD VM due to high usage size:S
With number of workers and parallel processed tests as well as with the increased number of products tested on OSD and users using the system the workload on OSD constantly increases. CPU load alerts had been seen recently in #96713 and the higher load is visible in https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=25 . We should increase the number of CPU cores on the OSD VM due to the higher usage.
- AC1: A reasonable higher number than the current 10 cores are available on OSD
- Create EngInfra ticket to increase the number of CPU cores for the OSD VM with above motivation described as well as referencing https://progress.opensuse.org/projects/openqav3/wiki/Wiki#openQA-infrastructure-needs-o3-osd . If EngInfra still wants the budget confirmed involve runger
- Consider updating the configuration of how we start openQA processes to make good use of the additional cores
- Status changed from Workable to Feedback
I filed the ticket using the new ticket system as described on https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog but I couldn't find the "Request participants" field anywhere so I've just wrote a comment for adding firstname.lastname@example.org. The ticket URL is https://sd.suse.com/servicedesk/customer/portal/1/SD-60167 but likely nobody can currently access it except myself.
repasting from SD, as i see you can't access it:
it's currently running on morla11 and it has quite constant cpu usage around 50%.
the problem might be that most morlas have 48cores, but one only on 32, so if during maintenance this workload ends up on the weaker one, we're already at 75%, so there won't be much overhead for spikes. If we want to have such huge VM, we'll need to make some special handling for it to be very sure it won't affect other workload. Does it definitely need to be one huge node or can you have multiple smaller workers distributed more evenly over the cluster?