The HP MSA1500CS is a pretty wimpy device. I have one, and I hate it. I'm somewhat surprised it has kept up with your stated workloads. It probably comes as no surprise that I recommend upgrading to the MSA2000. It has a much better storage architecture than the 1500CS, and can scale better.
Without more data I can't recommend going to an EVA4400 (HP's 'entry level enterprise array') versus the MSA2000. The 4400 will take you a lot farther than the MSA2000 will in terms of scale out, but I don't know what kind of growth you expect.
RE: LeftHand vs. MSA2000
So long as you have the ethernet network for it, the LeftHand unit should out-scale the MSA2000 by a long shot. The distributed storage controller it uses makes that kind of thing easy. You'll pay more per storage shelf, but you can scale to silly amounts with it. Once you start hitting the I/O ceilings on an MSA2000 (which will depend on the drive technology you use as well as any active/active configs you can use) you're pretty much done. For the LeftHand products that ceiling is a lot more mushy.
Where the LeftHand approach really saves you is with parity RAID. Doing rebuilds after a failure is the most CPU intensive thing it does, and is where my MSA1500cs falls flat on its ass. On my 1500cs, rebuilding a RAID6 array across 6.5TB of disk took about a week, during which time it was deeply intolerant of large scale I/O writes to anything on the array. Since LeftHand has a controller in each cabinet, restriping a LUN on one shelf will not affect performance of LUNs on other shelves. This is very nice!
All in all, if you have the budget for it the LeftHand devices should serve you a lot longer than the MSA2000.
The key to a good VMWare storage platform is understanding what kind of load VMWare generates.
- First, since you host a lot of servers, the workload is typically random. There are many IO streams going at the same time, and not many of them can be successfully pre-cached.
- Second, it's variable. During normal operations, you may see 70% random reads, however the instant you decide to move a VM to a new datastore or something, you'll see a massive 60GB sequential write. If you're not careful about architecture, this can cripple your storage's ability to handle normal IO.
- Third, a small portion of your environment will usually generate a large portion of the storage workload.
The best way to approach building storage for a VMWare platform is to start with the fundamentals.
- You need the ability to service a large random read workload, which means smaller faster drives, as well as possibly SSD. Most modern storage systems allow you to move data around automatically depending on how it's accessed. If you are going to use SSD, you want to ensure this is how you use it. It should be there as a way of gradually reducing hot-spots. Whether you use SSD or not, it's beneficial to be able to put all the work across all the drives, so something with a type of storage pooling would be beneficial.
- You need the ability to service intermittent large writes, which doesn't care as much about the spindle speed of the underlying drives, but does care about the controller stack's efficiency and the size of the cache. If you have mirrored caching (which is not optional unless you're willing to go back to backups whenever you have a controller failure), the bandwidth between the two caches used for mirroring will be your bottleneck for large sequential writes, usually. Ensure that whatever you get has a high speed controller (or cluster) interconnect for write caching. Do your best to get a high speed front end network with as many ports as you can get while remaining realistic on price. The key to good front end performance is to put your storage load across as many front end resources as possible.
- You can seriously reduce costs by having a tier for low priority storage, as well as thin provisioning. If your system isn't automatically migrating individual blocks to cheap large/slow drives (like nearline SAS or SATA with 7200 RPM and 2TB+ sizes), try to do it manually. Large slow drives are excellent targets for archives, backups, some file systems, and even servers with low usage.
- Insist that the storage is VAAI integrated so that VMWare can de-allocate unused parts of the VMs as well as the datastores.
Best Answer
Server hard-drive capacities are miniscule compared to desktop hard-drive capacities. 450 and 600GB are not uncommon sizes to see in brand new servers, and you could buy many 4TB SATA desktop drives for the price of one 600GB SAS (server) hard drive.
Your SATA hard-drive in your desktop PC at home is like a muscle car from Ford, or GM or Mercedes or any other manufacturer of cars for every-day people (large capacity V8 or V12, 5 or 6 litres). Because they need to be driven by people who don't have a racing license, or understand how an internal combusion engine works, they have very large tolerances. They have rev limiters, they're designed to run on any oil of a certain rating, they have service intervals say 10,000km apart, but if you miss a service interval by a few weeks it won't explode in your face. They don't catch fire when you drive long distances.
The SAS drive in a server is more akin to a Formula 1 engine. They're really small (2.4 litres) but have immense power outputs because of their tiny tolerances. They rev higher, and often have no rev limiter (which means they suffer serious damage if driven incorrectly), and if you miss a service interval (which is every few hours) they explode.
You're basically comparing chalk and cheese. Numbers and a full breakdown are discussed in the Intel Whitepaper Enterprise-class versus Desktop-class Hard Drives
Let's talk some hard numbers here. Let's say you request 1MB of additional data (a nice round number). How much data is that really? Well, your 1MB of data is going to go into a RAID array. Let's say they're being safe and making that into RAID1. Your 1MB of data is mirrored, so it's actually 2MB of data.
Let's say your data is inside a SAN. In case of a SAN node failure, your data is synchronized at a byte-level to a 2nd SAN node. So it's duplicated, and your 2MB of data is now 4MB.
You expect your provider to keep on-site backups, so your data can be restored in the case of a non-disaster emergency? Any decent provider is going to provide you with at least 1 on-site backup, perhaps more. Let's say they take snapshots once a week for three weeks on-site. That's an extra 3MB of data, so you're now up to 7MB.
If there is a critical disaster, your provider had better have a copy kept off-site somewhere. Even if it's a month old, it should exist. So now you're up to 8MB.
If it's a really high-level provider, they may even have a disaster recovery site that's synchronized live. These disks will be RAIDed as well, so that's an extra 2MB, and thus you're up to 10MB of data.
You're going to have to transfer that data eventually. What? Transfer it? Yes, data transfer costs money. It costs money when you download it, access it over the internet, it even costs money to back it up (someone has to take those tapes out of the office, and it could be that your 1MB of data means they have to purchase an extra set of tapes and transfer them somewhere).
When your SATA home drive fails you get to call tech support and convince them your drive is dead. Then send your drive in to the manufacturer (on your own dime most times). Wait a week. Get a replacement drive back and have to reinstall it (it almost certainly isn't hot swappable or in a drive sled already).
When that SAS drive fails you call the tech support. They almost never question your opinion that the drive needs immediate replacement and drop ship a new drive; usually the new drive is delivered later that same day, otherwise the next day is very common too. Commonly the manufacturer will send a representative out to actually install the drive if you don't know how (very handy if you plan on taking a vacation ever and need for things to keep working while you are away).
Enterprise drives have tight tolerances, see #2 above, and tend to last about 10 times longer than Consumer grade drives (MTBF). Enterprise drives almost always support advanced error and failure detection, which a Google report found works about 40% of the time, but that's something anyone would prefer to a computer suddenly dying.
When you have a single drive in your home computer, its statistical chances of failure are simply that of the drive. Drives used to be rated in MTBF (where SAS drives still enjoy ~50% higher ratings or more), now it's more common to see error rates. A typical SAS drive is 10 to 1,000 times less likely to have an unrecoverable error (with 100x the most common that I found recently). (error rates according to manufacturer documentation supplied by Seagate, Western Digital, and Hitachi; no bias intended; expressly disclaim indemnification).
Error rates are particularly important not when you run across an unrecoverable error on a drive, but when another drive in the same array fails and you are not relying on all the drives in an array to be readable in order to recover the failed disk.
SAS is a derivative of SCSI, which is a storage protocol. SATA is based on ATA, which is itself based on the ISA bus (that 8/16-bit bus in computers from the dinosaur age). The SCSI storage protocol has more extensive commands for optimizing the manner in which data is transferred from drives to controllers and back. This uptick in efficiency would make an otherwise equal SAS drive inherently faster, especially under extreme work loads, than a SATA drive; it also increases the cost.
There are fewer SAS drives produced, economies of scale dictate that they will be more expensive all else being equal.
SAS drives typically come in 10k or 15k rotational speeds; while SATA typically come in 5.4k or 7.2k. SAS drives, particularly the 2.5" size which are becoming increasingly popular, have faster seek times. The two combined dramatically increase the IOps a drive can perform, typically a SAS drive is ~3x faster. When multiple users are demanding disparate data, the IOps capacity of the drive/array becomes a critical performance indicator.
The drives in a data center are typically powered up all the time. Studies have found that drive failure is influenced by the number of heating/cooling cycles it goes through (from running vs turned off). Keeping them running all the time typically increases the drive's life. The consequence of this is that the drives consume electricity. This electricity has to be supplied by something (in the case of a large DC the drives alone might take more power than a small neighborhood of houses). They also need to dissipate that heat somewhere, requiring cooling systems (which themselves take more power to operate).
Infrastructure and staffing costs. Those drives are in high-end NAS or SAN units. Those units are expensive, even without the expensive drives in them. They require expensive staff to deploy and maintain them. The buildings that those NAS and SAN units are in are expensive to operate (see the point about cooling, above, but there's a lot more going on there.) The backup software is typically not free (nor are the licenses for things like mirroring), and the staff to deploy and maintain backups are usually pricey too. The cost of renting off-site tape delivery and storage is just one more of the many things that start to pile up when you need more storage.
Keeping in mind that the capacity of their drives may well be 1/10th the size of a desktop drive, and five times the price, your 1MB of data is actually 10, and all the other differences, there's no way you can draw any meaningful conclusions between the price of your desktop storage and the price of enterprise level storage.