action #97943: Increase number of CPU cores on OSD VM due to high usage size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

action #97943

closed

openQA Project (public) - coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

Increase number of CPU cores on OSD VM due to high usage size:S

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Target version:

openQA Project (public) - Ready

Start date:

Due date:

% Done:

Estimated time:

Description

Motivation¶

With number of workers and parallel processed tests as well as with the increased number of products tested on OSD and users using the system the workload on OSD constantly increases. CPU load alerts had been seen recently in #96713 and the higher load is visible in https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=25 . We should increase the number of CPU cores on the OSD VM due to the higher usage.

Acceptance criteria¶

AC1: A reasonable higher number than the current 10 cores are available on OSD

Suggestions¶

Create EngInfra ticket to increase the number of CPU cores for the OSD VM with above motivation described as well as referencing https://progress.opensuse.org/projects/openqav3/wiki/Wiki#openQA-infrastructure-needs-o3-osd . If EngInfra still wants the budget confirmed involve runger
Consider updating the configuration of how we start openQA processes to make good use of the additional cores

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by okurz over 3 years ago

Copied from action #96713: Slow grep in openqa-label-known-issues leads to high CPU usage added

Actions

Copy link

Updated by okurz over 3 years ago

Subject changed from Increase number of CPU cores on OSD VM due to high usage to Increase number of CPU cores on OSD VM due to high usage size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by mkittler over 3 years ago

Assignee set to mkittler

Actions

Copy link

Updated by mkittler over 3 years ago

Status changed from Workable to Feedback

I filed the ticket using the new ticket system as described on https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog but I couldn't find the "Request participants" field anywhere so I've just wrote a comment for adding osd-admins@suse.de. The ticket URL is https://sd.suse.com/servicedesk/customer/portal/1/SD-60167 but likely nobody can currently access it except myself.

Actions

Copy link

Updated by ghormoon over 3 years ago

repasting from SD, as i see you can't access it:

it's currently running on morla11 and it has quite constant cpu usage around 50%.

the problem might be that most morlas have 48cores, but one only on 32, so if during maintenance this workload ends up on the weaker one, we're already at 75%, so there won't be much overhead for spikes. If we want to have such huge VM, we'll need to make some special handling for it to be very sure it won't affect other workload. Does it definitely need to be one huge node or can you have multiple smaller workers distributed more evenly over the cluster?

Actions

Copy link

Updated by mkittler over 3 years ago

As mentioned in the Jira ticket it currently needs to be one host. I suppose we need to decide in the team how we'd like to proceed here.

Actions

Copy link

Updated by livdywan over 3 years ago

Discussed it in the weekly. From what I understand:

We can't get more CPU cores
Additional VMs could be provided
Dedicated hardware instead of a VM could also be provided

@mkittler please ask if we can get shared storage between multiple VMs

Actions

Copy link

Updated by ghormoon over 3 years ago

dedicated HW is usually bought by the team itself. I can discuss this with Evzenia when she comes back from vacation

it is technically possible to have a NFS share, if you need shared storage, if that helps you having more smaller VMs. how much would it need?

Actions

Copy link

Updated by mkittler over 3 years ago

Status changed from Feedback to Resolved

I'm resolving this ticket with the outcome that it is not possible to extend the VM. It would be possible to request another VM so I created #99549 to split our workload.

Actions

Copy link

#10

Updated by okurz over 3 years ago

Related to coordination #99549: [epic] Split production workload onto multiple hosts (focusing on OSD) added

Actions

Copy link

#11

Updated by okurz over 3 years ago

Parent task set to #80142

Actions

Copy link

#12

Updated by okurz over 3 years ago

Related to action #107701: [osd] Job detail page fails to load added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #97943

Increase number of CPU cores on OSD VM due to high usage size:S

Motivation¶

Acceptance criteria¶

Suggestions¶

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by ghormoon over 3 years ago

Updated by mkittler over 3 years ago

Updated by livdywan over 3 years ago

Updated by ghormoon over 3 years ago

Updated by mkittler over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago