Linux – Tuning Linux disk caching behaviour for maximum throughput

cacheftplinuxstorage

I'm running into a maximum throughput issue here and need some advice on which way to tune my knobs. We're running a 10Gbit fileserver for backup distribution. It's a two disk S-ATA2 setup on an LSI MegaRAID Controller. The server also got 24gig of memory.

We have a need to mirror our last uploaded backup with maximum throughput.

The RAID0 for our "hot" backups gives us around 260 MB/sec write and 275 MB/sec read. A tested tmpfs with size 20GB gives us around 1GB/sec. This kind of throughput is what we need.

Now how can I tune the virtual memory subsystem of Linux to cache the last uploaded files for as long as possible in memory without writing them out to disk (or even better: writing to disk AND keeping them in memory)?

I setup the following sysctls, but they dont give us the throughput we expect:

# VM pressure fixes
vm.swappiness = 20
vm.dirty_ratio = 70
vm.dirty_background_ratio = 30
vm.dirty_writeback_centisecs = 60000

This should in theory give us 16GB for caching I/O and wait some minutes until its writing to disk. Still when I benchmark the server I see no effect on writing, the throughput doesnt increase.

Help or advice needed.

Best Answer

By the look at the variables you've set, it seems like you are mostly concerned with write performance and do not care about possible data losses due to power outages.

You only will ever get the option for lazy writes and the use of a writeback cache with asynchronous write operations. Synchronous write operations require committing to disk and would not be lazy-written - ever. Your filesystem might be causing frequent page flushes and synchronous writes (typically due to journalling, especially with ext3 in data=journal mode). Additionally, even "background" page flushes will interfere with uncached reads and synchronous writes, thus slowing them down.

In general, you should take some metrics to see what is happening - do you see your copy process put in "D" state waiting for I/O work to be done by pdflush? Do you see heavy synchronous write activity on your disks?

If all else fails, you might choose to set up an explicit tmpfs filesystem where you copy your backups to and just synchronize data with your disks after the fact - even automatically using inotify

For read caching things are significantly simpler - there is the fcoretools fadvise utility which has the --willneed parameter to advise the kernel to load the file's contents into the buffer cache.

Edit:

vm.dirty_ratio = 70

This should in theory give us 16GB for caching I/O and wait some minutes until its writing to disk.

This would not have greatly influenced your testing scenario, but there is a misconception in your understanding. The dirty_ratio parameter is not a percentage of your system's total memory but rather of your system's free memory.

There is an article about Tuning for Write-Heavy loads with more in-depth information.

Related Solutions

Linux Apache2 – How to Disable All Disk Caching

I don't think you can disable all disk caching in Linux.

As a hack, you could keep running "sync; echo 3 > /proc/sys/vm/drop_caches" to flush almost anything that is cached in memory. From the console

watch -n 1 `sync; echo 3 > /proc/sys/vm/drop_caches`

would do the trick. In the above example nothing will remain cached by the kernel for more than a second, though it will have no effect on data held in memory by Apache or other processes. It may also not flush stuff from any memory-mapped files that are still open with portions locked.

If you only want nothing cached at the start of a test run, and don't care if it caches stuff during the tests, then you could just add a single call to "sync" and "echo 3 > /proc/sys/vm/drop_caches" at the start of your test run.

If your test involves scripts that access a database you will need to be able to tell your database back-end to not cache stuff in RAM between tests also.

Linux RAM Cache – Caching and Preloading Files into RAM

vmtouch seems like a good tool for the job.

Highlights:

query how much of a directory is cached
query how much of a file is cached (also which pages, graphical representation)
load file into cache
remove file from cache
lock files in cache
run as daemon

vmtouch manual

EDIT: Usage as asked in the question is listed in example 5 on vmtouch Hompage

Example 5

Daemonise and lock all files in a directory into physical memory:

vmtouch -dl /var/www/htdocs/critical/

EDIT2: As noted in the comments, there is now a git repository available.

Best Answer

Related Solutions

Linux Apache2 – How to Disable All Disk Caching

Linux RAM Cache – Caching and Preloading Files into RAM

Related Topic