SAN system upgrade/design

storagestorage-area-networkvmware-esx

Greetings, I'm a long-time lurker, first time poster. For brevity, let me ask my question up-front and then give the longer backstory, so you can choose how much you want to read.

Short version…

We're in need of a SAN upgrade, definitely additional capacity, and very possibly better performance. Our main workloads will be around 20 assorted VMs and a couple SQL databases. In addition to a few domain controllers, exchange, and assorted standard corporate services, the main workload is hosting an application over citrix that will be accessed by approximately 300 users throughout each day. Currently we have an HP MSA1500cs SAN controller with 2 MSA30 enclosures, nearly fully populated with about 3.7 TB of raw drive space. Pure capacity isn't really the big concern so much as performance and reliability. I don't see our capacity needs outgrowing 10 TB in the foreseeable future. The question is, should we just add an enclosure or 2 to our existing controller? Should we upgrade to the current generation MSA produts, the MSA2000 series? Should we move up to the EVA4400 family? Should we be looking at the recently aquired lefthand SAN solutions? If we end up doing something besides just adding space to our current controller, should we stick with fibrechannel or be looking at iSCSI? The budget is not really set as this is part of a larger project with a big budget umbrella, but I'd say we want to be under $50k and cheaper is always better.

Long version…

We are about to expand the services our server farm provides. If you want the details, we're a nursing home corporation with about 45 nursing homes. We will be implententing fully electronic charting, meaning our usage of specifically database and citrix will increase significantly. Right now there are probably about 2 nurses at each facility that actually interact with our medical records software every day. This will be changing to probably around 6 nurses at each facility using the software far more often than the current usage.

In addition to the medical records system, we provide active directory and exchange for around 600 users, and a payroll system, and the usual assortment of miscellaneous services. The current database (MS SQL 2005) for the medical records system is about 30 GB and it will grow some with the new usage model, but mostly the frequency of access will increase more than the raw size.

Before I get any more specific with hardware, let me say that we're an HP house and my boss, the Director of IS, is pretty hard-headed about going with any other vendor. You and I may not agree with the HP choice, but it's pretty set in stone.

We're upgrading our server farm from 16 HP BL20p blades with older Xeon CPUs (dual socket machines but most only have a single cpu in them currently) and not very much memory by today's standards (8 GB max supported, most have much less than that) to probably modern stand-alone servers, such as an HP DL580 G5 with 4 sockets each with a 4/6 core modern Xeon or Opteron and 128 GB of memory each. We're currently using VMWare ESX 3 and plan to upgrade to a current version of VMWare. I would appreciate comments on the servers too, but my main question is about our SAN, keep reading.

I have been tasked with researching an upgrade to our current SAN solution. We currently have an HP MSA1500cs controller and 2 fully populated drive enclosures, with about 3.7TB of raw drive space. These user SCSI Ultra320 drives and the enclosures talk to the controller with u320 connections, and the controller connects to the server farm with a 4 Gb 32 port fibrechannel SAN switch. We will need to add a little space to implement the change, but mostly I am concerned with performance and reliability. I don't see our total storage ever outgrowing the ballpark of 10 TB.

I'm pretty new to the world of SANs, and it's a bit overwhelming at this point. As I see it, we have 3 main options. We can add more enclosures to our current system. It supports up to 8 enclosures and we only have 2 right now, so this would be a very simple upgrade path. We could also upgrade to the current generation of HP's MSA family, the MSA2000 series. Our 3rd option as I see it would be to upgrade to the next class of SAN, the HP EVA series. HP says that the MSA family is considered and entry-level SAN, which is what makes me think we might need something more substantial, but I realize that's the marketing department speaking.

If we just add some enclosures to our current controller, we have enough fibrechannel ports to connect the new servers, especially since we'll be retiring several old servers. If we do upgrade to a new SAN system, this brings up the question of whether to continue to use fibrechannel, or to go with a newer (and generally cheaper) technology such as iSCSI or FCoE.

I appreciate any comments or answers, and if anything needs clarifying just ask and I'll try to give you as much information as possible. Thanks in advance!

Best Answer

The HP MSA1500CS is a pretty wimpy device. I have one, and I hate it. I'm somewhat surprised it has kept up with your stated workloads. It probably comes as no surprise that I recommend upgrading to the MSA2000. It has a much better storage architecture than the 1500CS, and can scale better.

Without more data I can't recommend going to an EVA4400 (HP's 'entry level enterprise array') versus the MSA2000. The 4400 will take you a lot farther than the MSA2000 will in terms of scale out, but I don't know what kind of growth you expect.

RE: LeftHand vs. MSA2000

So long as you have the ethernet network for it, the LeftHand unit should out-scale the MSA2000 by a long shot. The distributed storage controller it uses makes that kind of thing easy. You'll pay more per storage shelf, but you can scale to silly amounts with it. Once you start hitting the I/O ceilings on an MSA2000 (which will depend on the drive technology you use as well as any active/active configs you can use) you're pretty much done. For the LeftHand products that ceiling is a lot more mushy.

Where the LeftHand approach really saves you is with parity RAID. Doing rebuilds after a failure is the most CPU intensive thing it does, and is where my MSA1500cs falls flat on its ass. On my 1500cs, rebuilding a RAID6 array across 6.5TB of disk took about a week, during which time it was deeply intolerant of large scale I/O writes to anything on the array. Since LeftHand has a controller in each cabinet, restriping a LUN on one shelf will not affect performance of LUNs on other shelves. This is very nice!

All in all, if you have the budget for it the LeftHand devices should serve you a lot longer than the MSA2000.

Related Topic