Nfs – Choosing a SAN technology for 100s of VM Web Servers

nfsstoragestorage-area-networkvirtualizationvmware-vsphere

The Problem

We have an issue with performance on an existing platform, so I'm turning to the hive mind for a second opinion on this. The performance issue so far relates to IOPS rather than throughput.

The Scenario

A blade centre of 16 hosts, each with 64GB of RAM. (It's a Dell M1000e w/ M610s, but that's probably not relevant)
500 VMs, all web servers (or associated web technologies such as MySQL, load balancers, etc), around 90% are Linux and the rest Windows.
Hypervisor is VMWare vSphere.
We need to provide host HA, so local storage is out. As such the hosts just have an SD card to boot.

A bit of background thinking

At the moment we are up to 6 hosts (the blade centre will be at full capacity in a years time at current growth) and we are running iSCSI to a Dell MD3220i w/ MD1220 for expansion.

Possible options we have considered, and immediate thoughts along with them:

  • Spreading the VMs across NFS datastores, and running NFS storage that meets performance requirement for up to a given number of VMs. NFS seems cheaper to scale, as well as been abstracted a bit more than block level storage so we can move it around as needed.
  • Adding more MD3220i controllers/targets. We are concerned though that doing this could have a negative effect somehow in how VMWare handles having lots of targets.
  • Swapping all disks from Nearline SAS to SSD. This ought to entirely solve the IOPS issue, but has the obvious side effect of slashing our storage capacity. Also it's still very expensive.
  • vSphere 5 has a storage appliance. We haven't researched this much, but it must work well?

The Question

What sort of storage would you run underneath all of that? It wouldn't need to scale to another blade centre, it would just need to provide relatively good performance for all of those VMs.

I'm not looking for "Buy SAN x because it's the best" answers. I'm looking for thoughts on the various SAN technologies (iSCSI, FC, FCoE, InfiniBand, NFS, etc), different types of storage (SATA, SAS, SSD), and methodologies for handling storage for 100s of VMs (Consolidation, Separation, Sharding, etc).

Absolutely any thoughts, links, guides, pointers etc are welcome on this. I'd also love to hear thoughts on the above options we'd already considered.

Many thanks in advance for any input!

Update 5th March '12

Some fantastic responses so far, thank you very much everyone!

Going by the responses to this question so far, I'm beginning to think the following route is the way:

  • Tier the available storage to the VMWare cluster and place VM disks on suitable storage for their workloads.
  • Potentially make use of a SAN that is able to manage the placement of data on to suitable storage automagically.
  • Infiniband looks to be the most cost effective to get the required bandwidth with the hosts at full capacity.

It definitely sounds like it would be worth making use of the pre-sales services of a major SAN vendor to get their take on the scenario.

I'm going to continue to consider this problem for a while. In the mean time any more advise gratefully received!

Best Answer

The key to a good VMWare storage platform is understanding what kind of load VMWare generates.

  • First, since you host a lot of servers, the workload is typically random. There are many IO streams going at the same time, and not many of them can be successfully pre-cached.
  • Second, it's variable. During normal operations, you may see 70% random reads, however the instant you decide to move a VM to a new datastore or something, you'll see a massive 60GB sequential write. If you're not careful about architecture, this can cripple your storage's ability to handle normal IO.
  • Third, a small portion of your environment will usually generate a large portion of the storage workload.

The best way to approach building storage for a VMWare platform is to start with the fundamentals.

  • You need the ability to service a large random read workload, which means smaller faster drives, as well as possibly SSD. Most modern storage systems allow you to move data around automatically depending on how it's accessed. If you are going to use SSD, you want to ensure this is how you use it. It should be there as a way of gradually reducing hot-spots. Whether you use SSD or not, it's beneficial to be able to put all the work across all the drives, so something with a type of storage pooling would be beneficial.
  • You need the ability to service intermittent large writes, which doesn't care as much about the spindle speed of the underlying drives, but does care about the controller stack's efficiency and the size of the cache. If you have mirrored caching (which is not optional unless you're willing to go back to backups whenever you have a controller failure), the bandwidth between the two caches used for mirroring will be your bottleneck for large sequential writes, usually. Ensure that whatever you get has a high speed controller (or cluster) interconnect for write caching. Do your best to get a high speed front end network with as many ports as you can get while remaining realistic on price. The key to good front end performance is to put your storage load across as many front end resources as possible.
  • You can seriously reduce costs by having a tier for low priority storage, as well as thin provisioning. If your system isn't automatically migrating individual blocks to cheap large/slow drives (like nearline SAS or SATA with 7200 RPM and 2TB+ sizes), try to do it manually. Large slow drives are excellent targets for archives, backups, some file systems, and even servers with low usage.
  • Insist that the storage is VAAI integrated so that VMWare can de-allocate unused parts of the VMs as well as the datastores.