action #97943
closedopenQA Project (public) - coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
Increase number of CPU cores on OSD VM due to high usage size:S
0%
Description
Motivation¶
With number of workers and parallel processed tests as well as with the increased number of products tested on OSD and users using the system the workload on OSD constantly increases. CPU load alerts had been seen recently in #96713 and the higher load is visible in https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=25 . We should increase the number of CPU cores on the OSD VM due to the higher usage.
Acceptance criteria¶
- AC1: A reasonable higher number than the current 10 cores are available on OSD
Suggestions¶
- Create EngInfra ticket to increase the number of CPU cores for the OSD VM with above motivation described as well as referencing https://progress.opensuse.org/projects/openqav3/wiki/Wiki#openQA-infrastructure-needs-o3-osd . If EngInfra still wants the budget confirmed involve runger
- Consider updating the configuration of how we start openQA processes to make good use of the additional cores
Updated by okurz over 3 years ago
- Copied from action #96713: Slow grep in openqa-label-known-issues leads to high CPU usage added
Updated by okurz over 3 years ago
- Subject changed from Increase number of CPU cores on OSD VM due to high usage to Increase number of CPU cores on OSD VM due to high usage size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler over 3 years ago
- Status changed from Workable to Feedback
I filed the ticket using the new ticket system as described on https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog but I couldn't find the "Request participants" field anywhere so I've just wrote a comment for adding osd-admins@suse.de. The ticket URL is https://sd.suse.com/servicedesk/customer/portal/1/SD-60167 but likely nobody can currently access it except myself.
Updated by ghormoon over 3 years ago
repasting from SD, as i see you can't access it:
it's currently running on morla11 and it has quite constant cpu usage around 50%.
the problem might be that most morlas have 48cores, but one only on 32, so if during maintenance this workload ends up on the weaker one, we're already at 75%, so there won't be much overhead for spikes. If we want to have such huge VM, we'll need to make some special handling for it to be very sure it won't affect other workload. Does it definitely need to be one huge node or can you have multiple smaller workers distributed more evenly over the cluster?
Updated by mkittler about 3 years ago
As mentioned in the Jira ticket it currently needs to be one host. I suppose we need to decide in the team how we'd like to proceed here.
Updated by livdywan about 3 years ago
Discussed it in the weekly. From what I understand:
- We can't get more CPU cores
- Additional VMs could be provided
- Dedicated hardware instead of a VM could also be provided
@mkittler please ask if we can get shared storage between multiple VMs
Updated by ghormoon about 3 years ago
dedicated HW is usually bought by the team itself. I can discuss this with Evzenia when she comes back from vacation
it is technically possible to have a NFS share, if you need shared storage, if that helps you having more smaller VMs. how much would it need?
Updated by mkittler about 3 years ago
- Status changed from Feedback to Resolved
I'm resolving this ticket with the outcome that it is not possible to extend the VM. It would be possible to request another VM so I created #99549 to split our workload.
Updated by okurz about 3 years ago
- Related to coordination #99549: [epic] Split production workload onto multiple hosts (focusing on OSD) added
Updated by okurz almost 3 years ago
- Related to action #107701: [osd] Job detail page fails to load added