Logs and data drives have different data access patterns that are in conflict with each other (at least in theory) when they share a drive.
Log Writes
Log access consists of a very large number of small sequential writes. Somewhat simplistically, DB logs are ring buffers containing a list of instructions to write data items out to particular locations on the disk. The access pattern consists of a large number of small sequential writes that must be guaranteed to complete - so they are written out to disk.
Ideally, logs should be on a quiet (i.e. not shared with anything else) RAID-1 or RAID-10 volume. Logically, you can view the process as the main DBMS writing out log entries and one or more log reader threads that consume the logs and write the changes out to the data disks (in practice, the process is optimised so that the data writes are written out immediately where possible). If there is other traffic on the log disks, the heads are moved around by these other accesses and the sequential log writes become random log writes. These are much slower, so busy log disks can create a hotspot which acts as a bottleneck on the whole system.
Data Writes
(updated) Log writes must be committed to the disk (referred to as stable media) for a transaction to be valid and eligible to commit. One can logically view this as log entries being written and then used as instructions to write data pages out to the disk by an asynchronous process. In practice the disk page writes are actually prepared and buffered at the time the log entry is made, but they do not need to be written immediately for the transaction to be committed. The disk buffers are written out to stable media (disk) by the Lazy Writer process (Thanks to Paul Randal for pointing this out) which This Technet article discusses in a bit more detail.
This is a heavily random access pattern, so sharing the same physical disks with logs can create an artificial bottleneck on system performance. The log entries must be written for the transaction to commit, so having random seeks slowing down this process (random I/O is much slower than sequential log I/O) will turn the log from a sequenital into a random access device. This creates a serious performance bottleneck on a busy system and should be avoided. The same applies when sharing temporary areas with log volumes.
The role of caching
SAN controllers tend to have large RAM caches, which can absorb the random access traffic to a certain extent. However, for transactional integrity it is desirable to have disk writes from a DBMS guaranteed to complete. When a controller is set to use write-back caching, the dirty blocks are cached and the I/O call is reported as complete to the host.
This can smooth out a lot of contention problems as the cache can absorb a lot of I/O that would otherwise go out to the physical disk. It can also optimise the parity reads and writes for RAID-5, which lessens the effect on performance that RAID-5 volumes have.
These are the characteristics that drive the 'Let the SAN deal with it' school of thought, althoug this view has some limitations:
Write-back caching still has failure modes that can lose data, and the controller has fibbed to the DBMS, saying blocks have been written out to disk where in fact they haven't. For this reason, you may not want to use write-back caching for a transactional application, particlarly something holding mission-critical or financial data where data integrity problems could have serious consequences for the business.
SQL Server (in particular) uses I/O in a mode where a flag (called FUA or Forced Update Access) forces physical writes to the disk before the call returns. Microsoft has a certification program and many SAN vendors produce hardware that honours these semantics (requirements summarised here). In this case no amount of cache will optimise disk writes, which means that log traffic will thrash if it is sitting on a busy shared volume.
If the application generates a lot of disk traffic its working set may overrun the cache, which will also cause the write contention issues.
If the SAN is shared with other applications (particularly on the same disk volume), traffic from other applications can generate log bottlenecks.
Some applications (e.g. data warehouses) generate large transient load spikes that make them quite anti-social on SANs.
Even on a large SAN separate log volumes are still recommended practice. You may get away with not worring about layout on a lightly used application. On really large applications, you may even get a benefit from multiple SAN controllers. Oracle publish a series of data warehouse layout case studies where some of the larger configurations involve multiple controllers.
Put responsibility for performance where it belongs
On something with large volumes or where performance could be an issue, make the SAN team accountable for the performance of the application. If they are going to ignore your recommendations for configuration, then make sure that management are aware of this and that responsibility for system performance lies in the appropriate place. In particular, establish acceptable guidelines for key DB performance statistics like I/O waits or page latch waits or acceptable application I/O SLA's.
Note that having responsibility for performance split across multiple teams creates an incentive to finger-point and pass the buck to the other team. This is a known management anti-pattern and a formula for issues that drag out for months or years without ever being resolved. Ideally, there should be a single architect with authority to specify application, database and SAN configuration changes.
Also, benchmark the system under load. If you can arrange it, secondhand servers and direct-attach arrays can be purchased quite cheaply on Ebay. If you set up a box like this with one or two disk arrays you can frig with the physical disk configuration and measure the effect on performance.
As an example, I have done a comparison between an application running on a large SAN (an IBM Shark) and a two-socket box with a direct attach U320 array. In this case, £3,000 worth of hardware purchased off ebay outperformed a £1M high-end SAN by a factor of two - on a host with roughly equivalent CPU and memory configuration.
From this particular incident, it might be argued that having something like this lying around is a very good way to keep SAN administrators honest.
Without making enemies on the SAN team, how can I reassure myself and the application developers that our SQL servers aren't suffering from poorly configured storage? Just use perfmon stats? Other benchmarks like sqlio?
In short, there probably isn't a way to be truly sure. What I would say (I am a SAN admin), is that if your applications are performing up to your expectations, don't worry about it. If you start to see performance issues that you believe could be related to SAN/Disk IO performance, then it might be wise to inquire. I do not use much HP storage like you do, but in the IBM/NetApp world I can say from experience that there aren't many options which would allow you to configure it "poorly". Most enterprise storage these days takes a lot of the guesswork out of building raid arrays, and doesn't really let you do it wrong. Unless they are mixing drive speeds and capacities within the same raid groups you can rest-assured in most cases that your disk is performing fine.
If I load test on these SAN drives, does that really give me a reliable, repeatable measure of what I will see when we go live? (assuming that the SAN software might "dynamically configure" differently at different points in time.)
Load testing should be plenty reliable. Just keep in mind that when you are load testing one box, that being on a shared SAN/Disk Array that its performance can (and will) be affected by other systems using the same storage.
Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)
It can. It is not all about the disks, or which disks, the servers are on. All of the data is being served up via a disk controller, and then a SAN switch. The performance you will see greatly depends on how the disk controller is connected to is corresponding disk shelves, and the corresponding SAN. If the entire array connects to the backbone SAN on one single strand of 4gbps fiber, then clearly the performance will be impacted. If the array is connected across two redundant SAN's which are load balanced, using trunked links, then it would impossible for exchange alone to suck up too much bandwidth. Another thing which needs to be considered is how many IO/sec the array is capable of. As long as the array and the SAN it is connected to are scaled correctly, heavy IO in other parts of the SAN environment should not impact your SQL performance.
Would requesting separating logical drives for different functions logical drives (data vs log vs tempdb) help here? Would the SAN see the different IO activity on these and optimally configure them differently?
That is probably a matter of preference, and also greatly depends on how your storage admins configure it. They could give you three LUNs in the same array or volume, in which case its all the same anyway. If they gave you individual LUNs on different arrays, in different volumes (physically different disks), then it might be worth it for you to separate them.
We're in a bit of a space crunch right now. Application teams being told to trim data archives, etc. Would space concerns cause the SAN team to make different decisions on how they configure internal storage (RAID levels, etc) that could impact my server's performance?
I don't imagine your storage admin would change the raid level in order to free up space. If he would, then he should probably be fired. Space concerns can lead things to be configured differently, but not normally in a performance-impacting way. They might just become a little more tight about how much space they give you. They might enable features such as data de-duplication (if the array supports it) which can hinder the performance of the array while the process runs, but not around the clock.
Best Answer
I'm sorry but I have to strongly disagree with ynguldyn, what they say might be true for other disk arrays but this is an EVA and I'm a through-and-through EVA-guy. We have dozens of them, all models and sizes, and there's literally zero benefit breaking up disk-groups into smaller units, they're designed that way and offer best performance when in one big block. In fact splitting disk-groups slows EVAs.
For example one of my newer 8400's has 16 shelves of 12 x 450Gb 15krpm FC disks, using iozone we tested it split into two disk-groups and it was 20% slower across the board of tests in this manner compared to a single disk group.
Also the array isn't broken into RAID types at the disk-group level, so it's not RAID10 for everything, individual vdisks (LUNs) have a particular RAID level but the disk-group itself only has the option to pre-allocated one or two hot standby disks (leave this at one by the way). This way you can give DB logs, MSDTC, Quorum etc. LUNS R10 and choose how your data and backup LUNs are setup based on your performance requirements (personally I use R10 for all data and R5 for backups but that's your choice). Oh and it's much quicker, and safer, to keep all your different DB data types in separate LUNS ok :)
If you have any follow-up questions regarding EVA's/XP's or other HP storage feel free to come back up me ok.