Without making enemies on the SAN team, how can I reassure myself and the application developers that our SQL servers aren't suffering from poorly configured storage? Just use perfmon stats? Other benchmarks like sqlio?
In short, there probably isn't a way to be truly sure. What I would say (I am a SAN admin), is that if your applications are performing up to your expectations, don't worry about it. If you start to see performance issues that you believe could be related to SAN/Disk IO performance, then it might be wise to inquire. I do not use much HP storage like you do, but in the IBM/NetApp world I can say from experience that there aren't many options which would allow you to configure it "poorly". Most enterprise storage these days takes a lot of the guesswork out of building raid arrays, and doesn't really let you do it wrong. Unless they are mixing drive speeds and capacities within the same raid groups you can rest-assured in most cases that your disk is performing fine.
If I load test on these SAN drives, does that really give me a reliable, repeatable measure of what I will see when we go live? (assuming that the SAN software might "dynamically configure" differently at different points in time.)
Load testing should be plenty reliable. Just keep in mind that when you are load testing one box, that being on a shared SAN/Disk Array that its performance can (and will) be affected by other systems using the same storage.
Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)
It can. It is not all about the disks, or which disks, the servers are on. All of the data is being served up via a disk controller, and then a SAN switch. The performance you will see greatly depends on how the disk controller is connected to is corresponding disk shelves, and the corresponding SAN. If the entire array connects to the backbone SAN on one single strand of 4gbps fiber, then clearly the performance will be impacted. If the array is connected across two redundant SAN's which are load balanced, using trunked links, then it would impossible for exchange alone to suck up too much bandwidth. Another thing which needs to be considered is how many IO/sec the array is capable of. As long as the array and the SAN it is connected to are scaled correctly, heavy IO in other parts of the SAN environment should not impact your SQL performance.
Would requesting separating logical drives for different functions logical drives (data vs log vs tempdb) help here? Would the SAN see the different IO activity on these and optimally configure them differently?
That is probably a matter of preference, and also greatly depends on how your storage admins configure it. They could give you three LUNs in the same array or volume, in which case its all the same anyway. If they gave you individual LUNs on different arrays, in different volumes (physically different disks), then it might be worth it for you to separate them.
We're in a bit of a space crunch right now. Application teams being told to trim data archives, etc. Would space concerns cause the SAN team to make different decisions on how they configure internal storage (RAID levels, etc) that could impact my server's performance?
I don't imagine your storage admin would change the raid level in order to free up space. If he would, then he should probably be fired. Space concerns can lead things to be configured differently, but not normally in a performance-impacting way. They might just become a little more tight about how much space they give you. They might enable features such as data de-duplication (if the array supports it) which can hinder the performance of the array while the process runs, but not around the clock.
I've only read about this topic a handful of times. I found an article at www.vmguy.com here that sums up the consensus on this HT issue (direct from the article):
There are pros and cons to using HT in ESX.
Pros
Better co-scheduling of SMP VM’s
Hyperthreading provides more CPU
contexts and because of this, SMP VM’s
can be scheduled to run in scenarios
which would not have enough CPU
contexts without Hyperthreading.
- Typical applications see performance improvement in the 0-20%
range (the same as non-virtualized
workloads).
Cons
- Processor resources are shared with Hyperthreading enabled
Processor resources are shared such as
the L2 and L3 caches. This means that
the two threads running on the same
processor compete for the same
resources if they both have high
demand for them. This can, in turn,
degrade performance.
All things considered, it is difficult
to generalize the performance impact
of Hyperthreading. It is highly
dependant on the workload of the VM.
One additional point is that you can
always utilize the CPU min and max
values on a per-VM or Resource Pool
basis to reserve certain amounts of
CPU for your most critical workloads.
As with the majority of performance
items I enounter, test, test, test.
Try out the workloads and see what
works the best on the hardware you
have available.
Again, this is directly from the article. I'm not certain that HT is worth it as I too use AMD Opterons so I can't speak from experience.
Best Answer
In the SQLOS a scheduler is created for each logical processor that SQL Server sees. With hyperthreading enabled this equates to double the schedulers. One of the purposes for the SQLOS is to minimize and prevent context switching from occuring which is why only one scheduler is created for each logical processor. Once the SQLOS creates the schedulers, the total number of workers is divided amongst the schedulers. The SQLOS implements a form of cooperative scheduling in that the workers yield the scheduler as it requires unavailable resources, or reaches its execution quantum allowing other workers to execute on the scheduler. This keeps context switching to a minimum since the scheduler is doing the work and they are bound one for one.
Understanding that background, hyperthreading somewhat works opposite of how SQLOS is specifically designed to function. Specifically, parallelism can be problematic with hyperthreading and can result in high CXPACKET waits since SQLOS may try to run a query at DOP 8 on what is reality a DOP 4 system. If your CPU utilization is low, you might not notice, but the higher your CPU utilization goes the more problematic it could become. I recently had a discussion on twitter regarding this, and the consensus was "It Depends" as to whether or not it would help or hurt.
If you have a lot of signal waits on your server, but you have low CPU utilization, you may see benefit enabling hyperthreading which will double your internal schedulers and spread the workers out more which means they won't wait to execute in the runnable queue as long. However if your workload utilized parallelism heavily you get heavy CXPACKET waits in sys.dm_os_wait_stats, you might look at disabling hyperthreading to see if it reduces your waits.