Project

General

Profile

Actions

action #97943

closed

openQA Project - coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

Increase number of CPU cores on OSD VM due to high usage size:S

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

With number of workers and parallel processed tests as well as with the increased number of products tested on OSD and users using the system the workload on OSD constantly increases. CPU load alerts had been seen recently in #96713 and the higher load is visible in https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=25 . We should increase the number of CPU cores on the OSD VM due to the higher usage.

Acceptance criteria

  • AC1: A reasonable higher number than the current 10 cores are available on OSD

Suggestions


Related issues 3 (1 open2 closed)

Related to openQA Project - coordination #99549: [epic] Split production workload onto multiple hosts (focusing on OSD)New2021-09-30

Actions
Related to openQA Project - action #107701: [osd] Job detail page fails to loadResolvedtinita2022-02-28

Actions
Copied from openQA Infrastructure - action #96713: Slow grep in openqa-label-known-issues leads to high CPU usageResolvedokurz2021-09-10

Actions
Actions #1

Updated by okurz over 2 years ago

  • Copied from action #96713: Slow grep in openqa-label-known-issues leads to high CPU usage added
Actions #2

Updated by okurz over 2 years ago

  • Subject changed from Increase number of CPU cores on OSD VM due to high usage to Increase number of CPU cores on OSD VM due to high usage size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
Actions #4

Updated by mkittler over 2 years ago

  • Status changed from Workable to Feedback

I filed the ticket using the new ticket system as described on https://progress.opensuse.org/projects/qa/wiki#How-we-work-on-our-backlog but I couldn't find the "Request participants" field anywhere so I've just wrote a comment for adding osd-admins@suse.de. The ticket URL is https://sd.suse.com/servicedesk/customer/portal/1/SD-60167 but likely nobody can currently access it except myself.

Actions #5

Updated by ghormoon over 2 years ago

repasting from SD, as i see you can't access it:

it's currently running on morla11 and it has quite constant cpu usage around 50%.

the problem might be that most morlas have 48cores, but one only on 32, so if during maintenance this workload ends up on the weaker one, we're already at 75%, so there won't be much overhead for spikes. If we want to have such huge VM, we'll need to make some special handling for it to be very sure it won't affect other workload. Does it definitely need to be one huge node or can you have multiple smaller workers distributed more evenly over the cluster?

Actions #6

Updated by mkittler over 2 years ago

As mentioned in the Jira ticket it currently needs to be one host. I suppose we need to decide in the team how we'd like to proceed here.

Actions #7

Updated by livdywan over 2 years ago

Discussed it in the weekly. From what I understand:

  • We can't get more CPU cores
  • Additional VMs could be provided
  • Dedicated hardware instead of a VM could also be provided

@mkittler please ask if we can get shared storage between multiple VMs

Actions #8

Updated by ghormoon over 2 years ago

dedicated HW is usually bought by the team itself. I can discuss this with Evzenia when she comes back from vacation

it is technically possible to have a NFS share, if you need shared storage, if that helps you having more smaller VMs. how much would it need?

Actions #9

Updated by mkittler over 2 years ago

  • Status changed from Feedback to Resolved

I'm resolving this ticket with the outcome that it is not possible to extend the VM. It would be possible to request another VM so I created #99549 to split our workload.

Actions #10

Updated by okurz over 2 years ago

  • Related to coordination #99549: [epic] Split production workload onto multiple hosts (focusing on OSD) added
Actions #11

Updated by okurz over 2 years ago

  • Parent task set to #80142
Actions #12

Updated by okurz about 2 years ago

  • Related to action #107701: [osd] Job detail page fails to load added
Actions

Also available in: Atom PDF