RAID 10 Stripe Size for XenServer

optimizationraidraid10xenserver

Below is our current server configuration. In a few weeks I will be simulating a disaster recovery by installing 5 new disks (1 hot spare) and restoring all VMs from the backups.

Will I gain anything by changing the RAID stripe size to something other than 64KB? The RAID controller has options for 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB.

Any recommendations based on the specification below would be greatly appreciated – thanks.

Hardware:

Dell PowerEdge 2900 III
Dell PERC 6/i
Intel Xeon 2.5GHz (x2)
32GB RAM
Seagate ST32000645SS ES.2 2TB Near-Line SAS 7.2K (x4)

Software:

Citrix XenServer 6.2 SP1
VM - Windows SBS 2008 x64 - Exchange & multiple SQL express instances
VM - Windows Server 2003 R2 x86 - single SQL express instance
VM - CentOS 6.6 x64 (x2) - cPanel & video transcoding and streaming
VM - CentOS 6.3 x86 - Trixbox (VoIP)
VM - PHD Virtual Backup 6.5.3 (running Ubuntu 12.04.1 LTS)

Configuration:

RAID 10, 64k Stripe Size

Best Answer

I am going to try and sum up my comments into an answer. The basic line is:

You should not tinker with the strip size unless you have good evidence that it will benefit your workload.

Reasoning:

  • For striping, you have to choose some strip size and 64 KB is the default the manufacturer has chosen. As the manufacturer (LSI in this case, rebranded by Dell) does have a shitload of experience running a vast number of setups with different RAID levels and workloads, you might just trust them to have chosen wisely
  • 64 KB is likely to roughly match the average size of your requests in a virtualized environment (at least much more so than 256KB or 1 MB) and thus be a good trade-off between latency and seek time optimizations1.
  • accurate model-driven predictions about application performance with varying strip sizes are close to impossible due to the highly variant nature of workloads and the complexity of the models taking into account different read-ahead and caching algorithms at different layers

If you are the kind to get this evidence, you can do so by running your typical load and some of the atypical load scenarios with different strip size configurations, gather the data (I/O subsystem performance at the Xen Server layer, backend server performance and answer times at the application layer) and run it through a statistical evaluation. This however will be extremely time-consuming and is not likely to produce any groundbreaking results apart from "I might just have left it at default values in the end", so I would consider it a waste of resources.


1 If you assume a transfer rate of 100MB/s for a single disk, it is rather easy to see that a Kilobyte takes around 0,01ms to read, thus 64 KB will have a reading latency of 0,64ms. Considering that the average "service time" of a random I/O request typically will be in the range of 5-10ms, the reading latency is only is a small fraction of the total wait time. On the other hand, reading 512 KB will take around 5ms - which will matter for the "random small read" type of workload, considerably reducing the number of IOPS your array will be able to deliver in this specific case by the factor of 1.5 - 2. A scenario with concurrent random large read operations is going to benefit as larger block reads will induce less time-consuming seeks, but you are very unlikely to see this scenario in a virtualized environment.