Centos – Should we mount with data=writeback and barrier=0 on ext3

centosext3mountperformance-tuningwrite-barrier

We've been running a server on a VM at a hosting company, and have just signed up for a dedicated host (AMD Opteron 3250, 4 cores, 8GB RAM, 2 x 1TB in software RAID, ext3).

While running performance tests, we noticed that some SQLite transations (combination of inserts, deletes and/or updates) were taking 10x to 15x longer than on my 2010 MacBook Pro.

After lots of googling and reading, we got to look at the mount options, which were:

    data=ordered,barrier=1

We've done some experimenting, and got best performance with

    data=writeback,barrier=0

I've read up on these, and understand the basics of what they're doing, but I don't have a good sense / feel for whether it's a good idea for us to run like this?

Questions

Is the above config even sensible to consider for a hosted service?

If we had a power outage, or hard crash, then we might end up with data being lost, or files corrupted. If we were taking snapshots of the DB every 15 minutes, that might mitigate the situation, but the DB might not be sync'd when the snapshot is taken. How should (can?) we ensure the integrity of such a snapshot?

Are there other options we should be considering?

Thanks

Best Answer

First advice
If you cannot afford to lose any data (I mean once a user entered new data, if that cannot be lost in the coming seconds) and because you do not have something like a UPS, then I would not remove the write barrier, neither would I switch to writeback.

Removing write barriers
If you remove write barrier, then in case of crash or power loss, the file system will need to do a fsck to repair the disk structure (note that even with barrier ON, most journaling file system would still do a fsck even though the replay of the journal should have been sufficient). When removing write barrier, it is advisable to remove any disk caching (at the hardware) if possible, this helps minimizing the risk. You should benchmark the impact of such a change though. You can try this command (if your hardware supports it) hdparm -W0 /dev/<your HDD>.
Note that ext3 uses 2 barriers for on metadata change, whereas ext4 uses only one when using the mount option journal_async_commit.

Although Ted T'so explained why some few data corruption happened in the early days of ext3 (barriers were OFF by default until Kernel 3.1), the journal is placed in a way that unless a journal log wrap happens (journal are a cyclic log) data gets written to disk in a safe order - journal first, data second - even with hard disk supports reordering of writes.
Basically, it would be unlucky that a system crash or power loss happens when the journal log wrap. However, you need to keep data=ordered. Try to benchmark with data=ordered,barrier=0 in addition.

If you can afford to lose a few seconds of data, you could activate both options data=writeback,barrier=0 but then try to experiment with the commit=<nrsec> parameter as well. Check the manual for this parameter here. Basically you give a number of seconds which is a period the ext3 file system will sync its data and metadata.
You could try also try to fiddle and benchmark with some kernel tunables regarding dirty pages (those that need writing to disk), there is a good article here that explains everything about these tunables and how to play with them.

Summary regarding barriers
You should benchmark a few more combinations of tunables:

  1. Use data=writeback,barrier=0 in conjunction with hdparm -W0 /dev/<your HDD>
  2. Use data=ordered,barrier=0
  3. Use data=writeback,barrier=0 in conjunction with the other mount option commit=<nrsec> and try different values for nrsec
  4. Use option 3. and try further tunable at the kernel level regarding the dirty pages.
  5. Use the safe data=ordered,barrier=1, but try other tunables: especially the filesystem elevator (CFQ, Deadline or Noop) and their respecitve tunables.

Considering moving to ext4 and benchmarking it
As said ext4 requires less barrier than ext3 for a write. Furthermore, ext4 supports extents which for large files might bring better performance. So it is a solution worth exploring, especially since it is easy to migrate from an ext3 to ext4 without reinstalling: official documentation; I did that on one system but using this Debian guide. Ext4 is really stable since kernel 2.6.32 so it is safe to use in production.

Last consideration
This answer is far from complete, but it gives you enough materials to start investigating. This is so much dependent of requirements (at user or system level) that it is hard to have a straightforward answer, sorry about that.