Linux RAID5 Optimal Chunk Size

linuxperformanceraidraid5software-raid

I bring up, yet again, the ever-present question of how to best optimize disk structures. In my organization, we have a 14TB Linux software RAID array dedicated to storing backups made using Symantec Backup Exec. These are large files, 10GB – 100GB each, with some supporting metadata files a couple KB in size. Long story short, we have to recreate the array, and I would like to know the optimal array chunk size for this use case.

Details of our setup:

A Netgear ReadyNAS Pro, running a clean & updated install of CentOS 6.4.

6 x 3TB consumer (SATA II, 7200 RPM) hard drives from assorted vendors (identical in size).

Each drive has 3 identical partitions which form 3 software RAID devices:

  • /dev/md0: 6 x 32GB for / in a RAID6
  • /dev/md1: 6 x 4GB swap in a RAID10
  • /dev/md2: 6 x 2.7TB storage in a RAID5 for ~14TB total useful storage

Additionally, there is an integrated 128MB flash device set up as /boot

/dev/md2 is the array I'm focused on. It is made available as drive "R:" to a Windows Server 2008 R2 box running Symantec Backup Exec via multipath iSCSI over dual gigabit NICs on both machines (also running 9k jumbo frames).

On the Server 2008 box, R: is formatted as NTFS with a 64k cluster size, and is dedicated to storing backup files. The average file is generally between 40MB and 5GB, depending on the current proportion of full vs incrementals/differentials present. Disk usage is about a 50/50 split between read and write, as we mirror backups from this drive to tape as well.

Overall, given the hardware, I've think optimized this setup fairly well, however I'm not a storage expert, and the implications of the RAID chunk size are slightly beyond me. I know the default mdadm chunk size is 512KB. Is this optimal for my scenario? Should I adjust this to match NTFS cluster size? Or is there some magic formula I've missed?

Thanks for any help you can provide.

Edit: Benchmark results below. Not all combinations were tested.


##########   4K Chunk##########
-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   111.551 MB/s
          Sequential Write :    96.759 MB/s
         Random Read 512KB :   107.033 MB/s
        Random Write 512KB :    56.770 MB/s
    Random Read 4KB (QD=1) :     9.500 MB/s [  2319.2 IOPS]
   Random Write 4KB (QD=1) :     5.042 MB/s [  1231.0 IOPS]
   Random Read 4KB (QD=32) :   101.717 MB/s [ 24833.3 IOPS]
  Random Write 4KB (QD=32) :     8.237 MB/s [  2010.9 IOPS]

  Test : 1000 MB [R: 0.0% (0.1/13791.8 GB)] (x5)
  Date : 2013/07/12 13:10:31
    OS : Windows Server 2008 R2 Enterprise Edition (Full installation) SP1 

[6.1 Build 7601] (x64)

##########  32K Chunk##########
-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :    91.276 MB/s
          Sequential Write :    11.119 MB/s
         Random Read 512KB :     0.000 MB/s
        Random Write 512KB :     0.000 MB/s
    Random Read 4KB (QD=1) :     0.000 MB/s [     0.0 IOPS]
   Random Write 4KB (QD=1) :     0.000 MB/s [     0.0 IOPS]
   Random Read 4KB (QD=32) :     0.000 MB/s [     0.0 IOPS]
  Random Write 4KB (QD=32) :     0.000 MB/s [     0.0 IOPS]

  Test : 1000 MB [R: 0.0% (0.1/13791.8 GB)] (x5)
  Date : 2013/07/12 14:37:05
    OS : Windows Server 2008 R2 Enterprise Edition (Full installation) SP1 

[6.1 Build 7601] (x64)

##########  64K Chunk##########
-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   111.968 MB/s
          Sequential Write :   103.318 MB/s
         Random Read 512KB :   105.047 MB/s
        Random Write 512KB :    48.321 MB/s
    Random Read 4KB (QD=1) :    10.373 MB/s [  2532.5 IOPS]
   Random Write 4KB (QD=1) :     5.180 MB/s [  1264.5 IOPS]
   Random Read 4KB (QD=32) :    95.106 MB/s [ 23219.3 IOPS]
  Random Write 4KB (QD=32) :     9.108 MB/s [  2223.6 IOPS]

  Test : 1000 MB [R: 0.0% (0.1/13791.8 GB)] (x5)
  Date : 2013/07/12 12:47:37
    OS : Windows Server 2008 R2 Enterprise Edition (Full installation) SP1 

[6.1 Build 7601] (x64)

########## 128K Chunk##########
-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   111.908 MB/s
          Sequential Write :    94.305 MB/s
         Random Read 512KB :   104.772 MB/s
        Random Write 512KB :    43.821 MB/s
    Random Read 4KB (QD=1) :     9.247 MB/s [  2257.6 IOPS]
   Random Write 4KB (QD=1) :     4.929 MB/s [  1203.3 IOPS]
   Random Read 4KB (QD=32) :   101.764 MB/s [ 24844.8 IOPS]
  Random Write 4KB (QD=32) :     7.949 MB/s [  1940.6 IOPS]

  Test : 1000 MB [R: 0.0% (0.1/13791.8 GB)] (x5)
  Date : 2013/07/12 13:52:01
    OS : Windows Server 2008 R2 Enterprise Edition (Full installation) SP1 

[6.1 Build 7601] (x64)

########## 512K Chunk##########
-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   110.237 MB/s
          Sequential Write :    93.149 MB/s
         Random Read 512KB :   104.892 MB/s
        Random Write 512KB :    41.407 MB/s
    Random Read 4KB (QD=1) :     6.760 MB/s [  1650.3 IOPS]
   Random Write 4KB (QD=1) :     3.539 MB/s [   864.0 IOPS]
   Random Read 4KB (QD=32) :   101.139 MB/s [ 24692.3 IOPS]
  Random Write 4KB (QD=32) :     7.166 MB/s [  1749.6 IOPS]

  Test : 1000 MB [R: 0.0% (0.1/13791.8 GB)] (x5)
  Date : 2013/07/12 12:22:58
    OS : Windows Server 2008 R2 Enterprise Edition (Full installation) SP1 

[6.1 Build 7601] (x64)

##########1024K Chunk##########
-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   112.327 MB/s
          Sequential Write :    92.353 MB/s
         Random Read 512KB :   107.015 MB/s
        Random Write 512KB :    39.793 MB/s
    Random Read 4KB (QD=1) :     9.536 MB/s [  2328.0 IOPS]
   Random Write 4KB (QD=1) :     3.671 MB/s [   896.3 IOPS]
   Random Read 4KB (QD=32) :   101.990 MB/s [ 24900.0 IOPS]
  Random Write 4KB (QD=32) :     0.000 MB/s [     0.0 IOPS]

  Test : 1000 MB [R: 0.0% (0.1/13791.8 GB)] (x5)
  Date : 2013/07/12 14:17:08
    OS : Windows Server 2008 R2 Enterprise Edition (Full installation) SP1 

[6.1 Build 7601] (x64)

Best Answer

At a minimum, you want the chunk size to be a multiple or divisor of the filesystem block size. You've got that.

Everything else is likely to be implementation dependent. Since you're starting from scratch, you should roll your own benchmarks. Instead of creating a 14 TB RAID set, test with just 500 GB from each drive in various chunk sizes. The smaller volume sizes will reduce the amount of time needed to create the volume.

When you find the optimal number for your setup, then create your 14 TB RAID set. Test again to make sure you haven't had a performance degradation.