Windows – What scale of data loss or corruption do I risk if I enable the write buffer on a file server

buffercorruptiondata-losselectrical-powerwindows

I have found plenty of articles online warning of risk of data loss or corruption for drives with write-buffer enabled in the event of power loss. However, I haven't found any that actually refer to the scale of the risk.

I'm looking to build a mirrored file server in Storage Spaces on Windows Server 2016 for the purposes of a small video-editing office. Performance is very important (hence the write-buffer consideration), and our server would handle mostly two types of important writes: Uploading footage, and saving a project or document file.

This leads me to wonder what the worst case scenario would be in the event of unexpected power loss.

For uploading footage, I would expect any interruption to the server to cause a visible network failure to any file transfer in progress. Therefore, unless the power failure occurred seconds after the network portion of the file transfer completed, they would be aware of the need to restart the file transfer once the server was back online. Since I would be aware of the server going down, I could advise the office to use a sync program to presumably overwrite any corrupted files with the local master copies.

As for saving documents and project files, most of them should be so tiny as to have minimal risk of even being in the buffer at the time of failure. And if that wasn't the case, having autosaves or an open version still on the user's computer would give them a second chance. The only risk I can really see is if the power failure occurred right as they saved and closed the file, and that program didn't store rolling autosaves.

Is my assessment accurate, or have I overlooked something? Can corruption in this situation affect more data than that which was being written?

Thanks

Edit: I should stress that I'm not particularly looking for conclusions about what I should do in this scenario. I merely want to properly understand the possibilities so I can make an informed decision on the reality of this risk.

The many web pages I've read on the issue so far have been frustratingly ambiguous, particularly in regards to differentiating between 'write caching' and 'write-cache buffer flushing'.

Best Answer

You had to distinguish between enabled write buffer and disabled buffer flushes. To fully understand the difference, let's start from the basic.

HDDs and SSDs almost universally have a private DRAM cache used to briefly store and coalesce incoming writes, greatly speeding up their write performance. As a reference, consider that a fast SATA SSD pumped >500 MB/s of sequential writes with its buffer enabled, and only ~5 MB/s with the buffer disabled. HDDs show less severe performance degradation, but still.

At the same time, if these private DRAM caches are not powerloss-protected, severe data corruption (up to losing the entire filesystem) can happen. To prevent this issue without totally destroying performance, some possibilities exists:

  • use drives with powerloss protected write caches (ie: enterprise SSD and some newer NV-enabled mechanical HDD)
  • use an hardware RAID controller with powerloss-protected cache, disabling the private disk's DRAM cache
  • use cheap consumer hardware with unprotected DRAM cache enabled, but issuing periodic flushes to guarantee filesystem (but not data, as the performance impact would be very big) consistency.

When using software-RAID like approaches (ie: Linux MDRAID, ZFS, Storage Spaces, ecc) you should never disable disk caches, unless you are ready to pay a very high performance cost. Rather, your best bet is to leave write cache enabled and let your OS/filesystem free to issue DRAM sync/flushes commands whenever it wants. In this manner, you gain the performance speedup of the enabled cache without risking to nuke your entire filesystem. Please note that application data are not automatically protected: any application wanting to ensure data durability must issue periodic flushed itself (databases are a good example).

On the other hand, you should NEVER disable DRAM cache flushing, unless you are 200% sure your drives/RAID card have a protected writeback cache. However, in this case, leaving flushes enabled would do no big harm, as almost any recent drive/card simply ignores flushes when its protected DRAM cache is in a healthy state.