SATA Disks – Handling Write Caching Properly

cachedata integrityhard drivesata

It's pretty common to see advice to disable the write cache on individual disks used for databases because otherwise some disks will acknowledge writes that haven't yet made it to the disk surface.

This implies that some disks don't acknowledge writes until they've made it to the disk surface (Update: or that they report accurately when asked to flush the cache. Where can I find such disks, or where can I look for authoritative information on where to find such disks?

I'm setting up some DB servers that would really benefit from using write cacheing, but the application is price sensitive and I'd rather not double the cost of my disk subsystem for some caching RAID controller because I don't have enough information to know whether I can trust the cache in each drive.

Best Answer

Generally speaking, in direct answer to your question, I am not aware of any major brands of SATA drives that the drive itself has had bugs relative to proper operation with write caching enabled. That is, from a drive perspective only, the drive does what it is supposed to do from a caching perspective. I would also note that even when write caching is enabled, that the delay from a disk write on the SATA cable to the rotating media physically being updated is still very short (~50 to 100ms typically). It's not like the dirty cache data will be just sitting there for seconds at a time.....the drive is continually trying to get dirty data from the cache onto the physical media as soon as it can. This is not just a question of data safety, but one of being ready to accept future writes without any delay (ie: write posting).

The issue that arises when caching is enabled is that the write order to the drive over the SATA cable and the write order to the rotating media is not the same. This can never cause a problem UNLESS you have a loss of power or a system crash before all contents of the cache make it to disk. Why? ->

The issue that can arises here is relative to transaction robustness of the the file system and/or database file contents to these out of order lost writes. In effect, those potentially lost out of order writes can theoretically corrupt the integrity of the transaction logic that would have otherwise have been guaranteed by the disk writes happening in a very specific order to the media.

Now, of course, the designers of the file system, databases, RAID controllers, etc. are aware (or certainly should be aware) of this phenomenon relative to write caching. The write caching is extremely desirable from a performance standpoint in most random access type I/O scenarios. In fact, having the write caching available is a key element of being able to have any real benefit to the more advanced Native Command Queuing (NCQ) that is supported on newer SATA and the last few generations of PATA implementations. So, to guarantee order to the physical media at such certain critical times, the file system and/or application, etc. can specifically request a flush of the write caches to the media. At the completion of this sync request - everything pending from (potentially) file buffers, OS disk caching, physical disk caching etc. is actually out on the media per the transaction system design at the right critical operations. That is, this happens correctly if the programmers make the right call(s) up at the top AND every element of this chain of software and hardware layers did their job correctly. ie: There are no bugs in this regard in the drive, the RAID controllers, the disk drivers, the OS caches, the file system, the database engine, etc. This is a lot of software that all has to work exactly right. Additionally, verifying correctness in this regard is very difficult because in almost any situation normally the write order doesn't matter at all....and power failure and crash scenarios are difficult tests to construct. So, in the end "turning off write caching" at one or more of the various layers and/or meanings of this term....has the reputation of "fixing" certain kinds of issues. In effect, shutting off the write caching behaviors of the RAID controller or OS Disk Caches, or the Drive, etc. is avoiding one or more bugs in the system.....and the source of such lore.

Anyway, getting back to the core of the question: Under SATA, the specific handling of all the disk read/write commands and the flush cache commands are well defined by the SATA specifications. Additionally, the drive manufactures should have detailed documentation for each drive model or drive family describing their implementation and compliance to these rules like this example for Seagate Barracuda drives. In particular, see details of the SATA SET FEATURES command that controls drive operational mode and specifically option 82h can be used to disable disk caching at the drive level because the default is certainly write caching enabled on all drives I am aware of. If you really wanted to disable the cache, this command has to be done at start of each drive reset or power up and is typically under the control of the disk drivers for your operating system. You might be able to encourage your OS driver to set this mode via an IOCTL and/or Registry Setting type thing, but this varies widely.

Related Topic