Project

General

Profile

Actions

action #58562

closed

[qam] Migration of s390x jobs to ease zkvm resources

Added by brhavel over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
2019-10-23
Due date:
2019-11-01
% Done:

100%

Estimated time:
Difficulty:

Description

Currently zkvm resources are running out and zkvm machines should be eased a bit. This can be done by moving some of the currently running jobs to run on native kvm on s390x.

s390x KVM has 4 openQA workers. zkvm has 3 openQA workers.

Currently zkvm is used by QAM at:

  • Installation tests (once for all new incidents, ~10min)
  • Minimal testing (only some incidents 15&12, ~40minutes)
  • Part of kernel testing (all kernel incidents 15&12, hours per each)
  • Twice a day for 15SP1 only, partial testing (multiple hours each)

The first three happen sporadically ie based on amount of incidents. The last one is predictable and is almost constantly consuming resources.

  • Moving the regular 15SP1 testing away would free up load in a constant way, so maybe the best way to start
Actions #1

Updated by tjyrinki_suse over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to tjyrinki_suse
Actions #2

Updated by tjyrinki_suse over 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

https://openqa.suse.de/group_overview/232 has been running fine for several days since moving away from using zkvm. Matthias confirmed I'm using correct machine.

I think this already should help things quite a bit - at least 6 hours of worker time every day.

How to evaluate if the current resource use is now better compared to before?

Actions #3

Updated by tjyrinki_suse over 4 years ago

  • Status changed from Resolved to Feedback
Actions #4

Updated by brhavel over 4 years ago

Hello Timo, thank you for your work on this one. There is grafana implemented https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&refresh=30s which track the state of load for openQA workers. However, I have no experiences here so I do not know whether it is monitored on level of particular architectures. I can you can not find the results there, I would ask Matthias whether he sees any improvements in regards of the load.

Actions #5

Updated by okurz over 4 years ago

I can recommend https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test which shows graphs of the job queue per architecture and also specific workers. You can also simply look into the scheduled jobs on https://openqa.suse.de/tests/ and see if a particular openQA "machine" selection is always at the end of the line in the scheduled queue :)

Actions #6

Updated by tjyrinki_suse over 4 years ago

  • Status changed from Feedback to In Progress

Thanks Oliver! I'm doing a few more changes and continuing observing this for a while before closing it.

Grenache seems to have lots of free time so I moved SLE12 Kernel incident testings there.

Actions #7

Updated by tjyrinki_suse over 4 years ago

  • Status changed from In Progress to Resolved

SLE12 kernel testing continuing fine now at the new place: https://openqa.suse.de/tests/3545914

I'm going to mark this as resolved, at least with my staring of the graphs and queues the situations seems to be quite balanced for s390x with the biggest tasks moved to s390x-kvm-*, but please point out if there still seem to be problem points somewhere.

Actions #8

Updated by maritawerner over 3 years ago

Timo, I can see that this ticket is resolved already but I have a qustion here. Do you know the non technical background here? zKVM used to be an IBM product that was discontinued after SLE 12 SP3. So starting with SLES 12 SP4 zKVM was discontinued and IBM supported SUSE's KVM on s390x. So there are old products that should be tested on zKVM but newer products should be tested on SUSE's KVM (=native KVM)? I am not even sure if product QA is aware of that now.

Actions #9

Updated by tjyrinki_suse over 3 years ago

Good to know the history, I'll spread the information on our squad's channel.

Actions

Also available in: Atom PDF