Without making enemies on the SAN team, how can I reassure myself and the application developers that our SQL servers aren't suffering from poorly configured storage? Just use perfmon stats? Other benchmarks like sqlio?
In short, there probably isn't a way to be truly sure. What I would say (I am a SAN admin), is that if your applications are performing up to your expectations, don't worry about it. If you start to see performance issues that you believe could be related to SAN/Disk IO performance, then it might be wise to inquire. I do not use much HP storage like you do, but in the IBM/NetApp world I can say from experience that there aren't many options which would allow you to configure it "poorly". Most enterprise storage these days takes a lot of the guesswork out of building raid arrays, and doesn't really let you do it wrong. Unless they are mixing drive speeds and capacities within the same raid groups you can rest-assured in most cases that your disk is performing fine.
If I load test on these SAN drives, does that really give me a reliable, repeatable measure of what I will see when we go live? (assuming that the SAN software might "dynamically configure" differently at different points in time.)
Load testing should be plenty reliable. Just keep in mind that when you are load testing one box, that being on a shared SAN/Disk Array that its performance can (and will) be affected by other systems using the same storage.
Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)
It can. It is not all about the disks, or which disks, the servers are on. All of the data is being served up via a disk controller, and then a SAN switch. The performance you will see greatly depends on how the disk controller is connected to is corresponding disk shelves, and the corresponding SAN. If the entire array connects to the backbone SAN on one single strand of 4gbps fiber, then clearly the performance will be impacted. If the array is connected across two redundant SAN's which are load balanced, using trunked links, then it would impossible for exchange alone to suck up too much bandwidth. Another thing which needs to be considered is how many IO/sec the array is capable of. As long as the array and the SAN it is connected to are scaled correctly, heavy IO in other parts of the SAN environment should not impact your SQL performance.
Would requesting separating logical drives for different functions logical drives (data vs log vs tempdb) help here? Would the SAN see the different IO activity on these and optimally configure them differently?
That is probably a matter of preference, and also greatly depends on how your storage admins configure it. They could give you three LUNs in the same array or volume, in which case its all the same anyway. If they gave you individual LUNs on different arrays, in different volumes (physically different disks), then it might be worth it for you to separate them.
We're in a bit of a space crunch right now. Application teams being told to trim data archives, etc. Would space concerns cause the SAN team to make different decisions on how they configure internal storage (RAID levels, etc) that could impact my server's performance?
I don't imagine your storage admin would change the raid level in order to free up space. If he would, then he should probably be fired. Space concerns can lead things to be configured differently, but not normally in a performance-impacting way. They might just become a little more tight about how much space they give you. They might enable features such as data de-duplication (if the array supports it) which can hinder the performance of the array while the process runs, but not around the clock.
I guess this depends on you definition of white box
Lefthand is really just a HP server with a custom OS on it.
Using Datacore you could do a similar thing, HP server with Windows and Datacore software
I've seen plenty examples where both of these outperform EMC systems, and I'm not talking bottom of the barrel AX systems either
for a white box setup the performance issue really come from the RAID cards, if your putting P800 HP cards in, they are costing you close to $1000 up anyway, but they perform very well, throw that in a Datacore system with 32gb of memory and all of a sudden you have twice as much cache as a high end EMC system, cache is where your performance lies, but the more in cache, the more not on disk, how often is the cache being written to disk? what sort of problems arise when there is a failure? individual UPS for that kind of white box is a great start, especially when you can interface it with the server for controlled cache flushing and such
Generally a whitebox SAN won't perform as well as a commercial SAN, this is generally down to components though, and just what you consider a whitebox SAN.
I don't consider a Datacore or Lefthand system whitebox, because they are commercially supported software products, despite the fact that they are really just software layers for industry standard servers
Edit: Just to mirror what was said below, it is important that if the storage is important and mission critical, you have adequate support, that means not just buying the best product, but the one that offers the best support, now thats not always the big names, but its really not likely to be whoever is out back putting that white box together is it? :)
Best Answer
If this P2000 is anything like my old MSA4400, then vDisks are volumes that are assigned to servers as LUNs, but under the covers, what is happening on the HP has little in common with a Clariion.
The way I remember it, the HP has a bunch of disks that it sets up in fixed groups (with RAID, I think), and then vDisks are created that live on these disks with their own virtual raid. So you could have one vDisk with virtually raid 10, meaning each block or chunk that comprised the vDisk would be saved twice, and another one with virtually raid 5, meaning that the chunks would get saved once, but would have distributed parity.
I'm a little hazy on the details about the actual disks, whether there was raid under all these vDisks or just a JBOD. I do remember that we had global spares, because we had several dozen disks of about 100 fail over the course of three weeks, and the system was able to take the hits until more than 9 were rebuilding at the same time.
Maybe someone with more recent and specific P2000 experience can chime in here and help, but this is what I remember.