action #20914

[tools] configure vm settings for workers with rotating discs

Added by coolo over 2 years ago. Updated 4 months ago.

Status:ResolvedStart date:28/07/2017
Priority:NormalDue date:05/11/2019
Assignee:okurz% Done:

0%

Category:-
Target version:openQA Project - Current Sprint
Duration: 593

Description

Especially aarch64 machines are too slow syncing qemu, so we need to tweak their configs in salt

This will cost performance - and possibly making the 'HMP timeout' issue more prominent, but it will also make the
needling matching more predictable.

Jan Kara's recommendation is:
dirty_bytes to 200000000 (~200 MB) and
dirty_background_bytes to 50000000 (~50 MB).

after the experiments in https://github.com/os-autoinst/os-autoinst/pull/664

We only need this for the HDD hosts, having it on NVME shouldn't hurt - but I can't really say


Related issues

Related to openQA Infrastructure - action #58805: [infra]Severe storage performance issue on openqa.suse.de... Feedback 29/10/2019
Related to openQA Tests - action #50615: [functional][y] test fails in await_install - does not ca... Resolved 22/04/2019

History

#1 Updated by coolo over 1 year ago

  • Project changed from openQA Tests to openQA Infrastructure
  • Category deleted (Infrastructure)

#2 Updated by nicksinger over 1 year ago

  • Status changed from New to Workable

#3 Updated by okurz 5 months ago

@coolo do you think we should still try to tinker with these variables? I don't think the mentioned problems are relevant anymore but of course we can still try improve based on vm options.

#4 Updated by coolo 5 months ago

What makes you think linux's memory management got any better since?

#5 Updated by okurz 4 months ago

  • Status changed from Workable to Feedback
  • Assignee set to okurz
  • Target version set to Current Sprint

#6 Updated by okurz 4 months ago

  • Due date set to 05/11/2019

merged. Let's monitor if it has any measurable impact.

#7 Updated by coolo 4 months ago

  • Related to action #58805: [infra]Severe storage performance issue on openqa.suse.de workers added

#8 Updated by coolo 4 months ago

It had - I'm going to increase it to 10%/5% again. This is still 50% of the default, but way above the current settings

#9 Updated by coolo 4 months ago

I asked the DLs for 5 SSDs, let's see :)

#10 Updated by okurz 4 months ago

You did https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/215 and called it "Increase the dirty buffer size" whoever I believe you are actually decreasing it as the values are lower than default.

I have good experience with the following:

# https://askubuntu.com/questions/157793/why-is-swap-being-used-even-though-i-have-plenty-of-free-ram
# https://askubuntu.com/questions/440326/how-can-i-turn-off-swap-permanently
# https://superuser.com/questions/1115983/prevent-system-freeze-unresponsiveness-due-to-swapping-run-away-memory-usage
vm.dirty_background_ratio = 5
vm.dirty_ratio = 80
# okurz: 2019-01-04: Trying to prevent even more stuttering
# vm.swappiness = 10
# https://rudd-o.com/linux-and-free-software/tales-from-responsivenessland-why-linux-feels-slow-and-how-to-fix-that
vm.swappiness = 1
# did not actually experiment with finding a good value, just took the one from the above webpage
vm.vfs_cache_pressure = 50

As an alternative we can say whenever we hit problems due to this we need to simply buy more RAM.

WDYT?

#11 Updated by okurz 4 months ago

  • Related to action #50615: [functional][y] test fails in await_install - does not catch rebootnow added

#12 Updated by coolo 4 months ago

you don't understand the problem I'm afraid. this has nothing to do with RAM nor with swap.

#13 Updated by okurz 4 months ago

ok maybe I was misleading with mentioning the part about swap or thrashing. It's not about memory depletion for sure. So let me simply ask: Did you not decrease the values below default now?

#14 Updated by coolo 4 months ago

the default is 10% of memory which is about 26GB - our initial hit was at 200MB (which is less than 1% of default), which was too small. Now we're at 5% of memory, which is somewhere in the middle

#15 Updated by okurz 4 months ago

  • Status changed from Feedback to Resolved

Exactly. Anyway, I guess we can call this solved then. Adjusting the values is easy now and we can also make it smart, when necessary

Also available in: Atom PDF