Project

General

Profile

action #20914

[tools] configure vm settings for workers with rotating discs

Added by coolo about 4 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Start date:
2017-07-28
Due date:
2019-11-05
% Done:

0%

Estimated time:

Description

Especially aarch64 machines are too slow syncing qemu, so we need to tweak their configs in salt

This will cost performance - and possibly making the 'HMP timeout' issue more prominent, but it will also make the
needling matching more predictable.

Jan Kara's recommendation is:
dirty_bytes to 200000000 (~200 MB) and
dirty_background_bytes to 50000000 (~50 MB).

after the experiments in https://github.com/os-autoinst/os-autoinst/pull/664

We only need this for the HDD hosts, having it on NVME shouldn't hurt - but I can't really say


Related issues

Related to openQA Infrastructure - action #58805: [infra]Severe storage performance issue on openqa.suse.de workersResolved2019-10-29

Related to openQA Tests - action #50615: [functional][y] test fails in await_install - does not catch rebootnowResolved2019-04-22

History

#1 Updated by coolo almost 3 years ago

  • Project changed from openQA Tests to openQA Infrastructure
  • Category deleted (Infrastructure)

#2 Updated by nicksinger almost 3 years ago

  • Status changed from New to Workable

#3 Updated by okurz almost 2 years ago

coolo do you think we should still try to tinker with these variables? I don't think the mentioned problems are relevant anymore but of course we can still try improve based on vm options.

#4 Updated by coolo almost 2 years ago

What makes you think linux's memory management got any better since?

#5 Updated by okurz almost 2 years ago

  • Status changed from Workable to Feedback
  • Assignee set to okurz
  • Target version set to Current Sprint

#6 Updated by okurz almost 2 years ago

  • Due date set to 2019-11-05

merged. Let's monitor if it has any measurable impact.

#7 Updated by coolo almost 2 years ago

  • Related to action #58805: [infra]Severe storage performance issue on openqa.suse.de workers added

#8 Updated by coolo almost 2 years ago

It had - I'm going to increase it to 10%/5% again. This is still 50% of the default, but way above the current settings

#9 Updated by coolo almost 2 years ago

I asked the DLs for 5 SSDs, let's see :)

#10 Updated by okurz almost 2 years ago

You did https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/215 and called it "Increase the dirty buffer size" whoever I believe you are actually decreasing it as the values are lower than default.

I have good experience with the following:

# https://askubuntu.com/questions/157793/why-is-swap-being-used-even-though-i-have-plenty-of-free-ram
# https://askubuntu.com/questions/440326/how-can-i-turn-off-swap-permanently
# https://superuser.com/questions/1115983/prevent-system-freeze-unresponsiveness-due-to-swapping-run-away-memory-usage
vm.dirty_background_ratio = 5
vm.dirty_ratio = 80
# okurz: 2019-01-04: Trying to prevent even more stuttering
# vm.swappiness = 10
# https://rudd-o.com/linux-and-free-software/tales-from-responsivenessland-why-linux-feels-slow-and-how-to-fix-that
vm.swappiness = 1
# did not actually experiment with finding a good value, just took the one from the above webpage
vm.vfs_cache_pressure = 50

As an alternative we can say whenever we hit problems due to this we need to simply buy more RAM.

WDYT?

#11 Updated by okurz almost 2 years ago

  • Related to action #50615: [functional][y] test fails in await_install - does not catch rebootnow added

#12 Updated by coolo almost 2 years ago

you don't understand the problem I'm afraid. this has nothing to do with RAM nor with swap.

#13 Updated by okurz almost 2 years ago

ok maybe I was misleading with mentioning the part about swap or thrashing. It's not about memory depletion for sure. So let me simply ask: Did you not decrease the values below default now?

#14 Updated by coolo almost 2 years ago

the default is 10% of memory which is about 26GB - our initial hit was at 200MB (which is less than 1% of default), which was too small. Now we're at 5% of memory, which is somewhere in the middle

#15 Updated by okurz almost 2 years ago

  • Status changed from Feedback to Resolved

Exactly. Anyway, I guess we can call this solved then. Adjusting the values is easy now and we can also make it smart, when necessary

Also available in: Atom PDF