Without making enemies on the SAN team, how can I reassure myself and the application developers that our SQL servers aren't suffering from poorly configured storage? Just use perfmon stats? Other benchmarks like sqlio?
In short, there probably isn't a way to be truly sure. What I would say (I am a SAN admin), is that if your applications are performing up to your expectations, don't worry about it. If you start to see performance issues that you believe could be related to SAN/Disk IO performance, then it might be wise to inquire. I do not use much HP storage like you do, but in the IBM/NetApp world I can say from experience that there aren't many options which would allow you to configure it "poorly". Most enterprise storage these days takes a lot of the guesswork out of building raid arrays, and doesn't really let you do it wrong. Unless they are mixing drive speeds and capacities within the same raid groups you can rest-assured in most cases that your disk is performing fine.
If I load test on these SAN drives, does that really give me a reliable, repeatable measure of what I will see when we go live? (assuming that the SAN software might "dynamically configure" differently at different points in time.)
Load testing should be plenty reliable. Just keep in mind that when you are load testing one box, that being on a shared SAN/Disk Array that its performance can (and will) be affected by other systems using the same storage.
Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)
It can. It is not all about the disks, or which disks, the servers are on. All of the data is being served up via a disk controller, and then a SAN switch. The performance you will see greatly depends on how the disk controller is connected to is corresponding disk shelves, and the corresponding SAN. If the entire array connects to the backbone SAN on one single strand of 4gbps fiber, then clearly the performance will be impacted. If the array is connected across two redundant SAN's which are load balanced, using trunked links, then it would impossible for exchange alone to suck up too much bandwidth. Another thing which needs to be considered is how many IO/sec the array is capable of. As long as the array and the SAN it is connected to are scaled correctly, heavy IO in other parts of the SAN environment should not impact your SQL performance.
Would requesting separating logical drives for different functions logical drives (data vs log vs tempdb) help here? Would the SAN see the different IO activity on these and optimally configure them differently?
That is probably a matter of preference, and also greatly depends on how your storage admins configure it. They could give you three LUNs in the same array or volume, in which case its all the same anyway. If they gave you individual LUNs on different arrays, in different volumes (physically different disks), then it might be worth it for you to separate them.
We're in a bit of a space crunch right now. Application teams being told to trim data archives, etc. Would space concerns cause the SAN team to make different decisions on how they configure internal storage (RAID levels, etc) that could impact my server's performance?
I don't imagine your storage admin would change the raid level in order to free up space. If he would, then he should probably be fired. Space concerns can lead things to be configured differently, but not normally in a performance-impacting way. They might just become a little more tight about how much space they give you. They might enable features such as data de-duplication (if the array supports it) which can hinder the performance of the array while the process runs, but not around the clock.
There are many ways of handling data that size. A lot of it depends on your environment and how much money you're willing to spend. In general there are a few overall 'get the data off the server' strategies:
- Over the Ethernet Like it says on the box, data is streamed to Some Where Else for handling. 20TB will take a long time to copy over 1GbE, but it can be done. Hardware can help (such as 10GbE links, or in some cases NIC bonding).
- Over the Storage subsystem If you're on Fibre Channel, send it to another device on the FC network. If you've got SAS, send it to a SAS-attached device. Generally faster than Ethernet.
- Send it to another disk array Send it to another hunk of storage attached to the same server.
That's the 100Km view. Once you start zooming in things get a lot more fragmented. As already mentioned, LTO5 is a specific tape technology that designed for these kinds of high-density loads. Another identical storage array is a good target, especially if you can use something like GlusterFS or DRBD to get the data over there. Also, if you need a backup rotation or just the ability to keep running in case the array fails will affect what you put into place.
Once you've settled on a 100Km view method, getting into software will be the next big task. Factors influencing this are what you can install on your storage server in the first place (if its a NetApp, that's one thing, a Linux server with a bunch of storage is another thing entirely, as is a Windows server with a bunch of storage), what hardware you pick (not all FOSS backup packages handle tape-libraries well, for instance), and what kind of backup retention you require.
You really need to figure out what kind of Disaster Recovery you want. Simple live-replication is easier, but doesn't allow you to restore from last-week only just-now. If the ability to restore from last week is important to you, then you need to design for that sort of thing. By law (in the US and else where) some data needs to be preserved for 7+ years.
Simple replication is the easiest to do. This is what DRBD is designed to do. Once the initial copy is done, it just sends changes. Complicating factors here are network locality, if your 2nd array is not near to the primary DRBD may not be feasible. You'll need a 2nd storage server with at least as much storage space as the first.
About tape backup...
LTO5 can hold 1.5TB of data w/o compression. Feeding these monsters requires very fast networking, which is either Fibre Channel or 6Gb SAS. Since you need to back up more than 1.5TB in a whack you need to look into autoloaders (here is an example: link, a 24 slot 1-drive autoloader from HP). With software that supports them, they'll handle changing tapes mid-backup for you. They're great. You'll still have to pull tapes out to send to off-site, but that's a damn sight better than hanging around all night to load tapes yourself when the backup calls for them.
If tape gives you the 'legacy, ew' heebiegeebies, a Virtual Tape Library may be more your speed (such as this one from Quantum: link). These pretend to be tape libraries to backup software while actually storing things to disk with robust (you hope) de-duplication techniques. The fancier ones will even copy virtual-tapes to real-tapes for you, if you like that sort of thing, which can be very handy for off-site rotations.
If you don't want to muck about with even virtual tapes, but still want to do direct-to-disk backups, you'll need a storage array sized big enough to handle that 20TB, plus however much net-change data you want to keep a hold of. Different backup packages handle this differently. Some de-duplication technologies are really nice, others are hacky kludges. I personally don't know the state of FOSS backup software packages in this area (I've heard of Bacula), but they may be sufficient. A lot of commercial backup packages have local agents you install on servers to be backed up in order to increase throughput, which has a lot of merits.
Best Answer
If you can afford it, a centralized storage system is ideal for safely and reliably serving your company's data.
That said, check your math- while it's fine to back servers and computers up to a NAS, it's almost never the best choice. Archives are also not optimal for a NAS. Archives and backups are best put on something else like a tape drive or deduplicated disk pool. Either way, you need something like this to back up the primary data stored on the NAS.