I would suggest RAID10 for heavy database work that isn't mainly read-only, not RAID5 or 6. On a 4-drive RAID6 array each block write (or partial block write) could be turned into a read (to get the other data block) followed by three writes (one for the initial write, two for the two parity blocks) which can have a significant write performance impact. With RAID10 each (partial) block write is just two writes to disk.
If your database access did include very few writes then RAID6 may be preferable for redundancy as a RAID6 array can survive any two drives failing where-as a RAID10 array of four drive will only survive 4 of the 6 possible combinations of two failed drives, but you state that you expect the activity to be both read and write intensive (and you have good backup and disaster recovery plans, right?).
Obviously make sure you go for a 64 bit edition of Windows and SQL Server, otherwise you'll not be able to use all that RAM without performance draining hacks like PAE. While we are on memory: I would suggest more if you can. RAM is not expensive these days, even good know-brand ECC capable RAM, so going up from 8 to how-ever-much-you-can-fit-on-the-board isn't a bad idea. For most applications in SQL server (with big enough databases) you will notice the benefit under any appreciable load as it will greatly reduce I/O operations for read queries as a larger working set can be held in memory. From your description it sounds like you have a data size and activity pattern that would benefit from how ever much RAM you can practically throw at it. Many new servers support 16Gb of RAM these days, and 32Gb is not uncommon (and increasingly neither is support for 64Gb, though that is getting into the "specialist" market so you may have to pay a premium for the extra).
I've found that when I've had to tune for lower latency vs throughput, I've tuned nr_requests down from it's default (to as low as 32). The idea being smaller batches equals lower latency.
Also for read_ahead_kb I've found that for sequential reads/writes, increasing this value offers better throughput, but I've found that this option really depends on your workload and IO pattern. For example on a database system that I've recently tuned I changed this value to match a single db page size which helped to reduce read latency. Increasing or decreasing beyond this value proved to hurt performance in my case.
As for other options or settings for block device queues:
max_sectors_kb = I've set this value to match what the hardware allows for a single transfer (check the value of the max_hw_sectors_kb (RO) file in sysfs to see what's allowed)
nomerges = this lets you disable or adjust lookup logic for merging io requests. (turning this off can save you some cpu cycles, but I haven't seen any benefit when changing this for my systems, so I left it default)
rq_affinity = I haven't tried this yet, but here is the explanation behind it from the kernel docs
If this option is '1', the block layer will migrate request completions to the
cpu "group" that originally submitted the request. For some workloads this
provides a significant reduction in CPU cycles due to caching effects.
For storage configurations that need to maximize distribution of completion
processing setting this option to '2' forces the completion to run on the
requesting cpu (bypassing the "group" aggregation logic)"
scheduler = you said that you tried deadline and noop. I've tested both noop and deadline, but have found deadline win's out for the testing I've done most recently for a database server.
NOOP performed well, but for our database server I was still able to achieve better performance adjusting the deadline scheduler.
Options for deadline scheduler located under /sys/block/{sd,cciss,dm-}*/queue/iosched/ :
fifo_batch = kind of like nr_requests, but specific to the scheduler. Rule of thumb is tune this down for lower latency or up for throughput. Controls the batch size of read and write requests.
write_expire = sets the expire time for write batches default is 5000ms. Once again decrease this value decreases your write latency while increase the value increases throughput.
read_expire = sets the expire time for read batches default is 500ms. Same rules apply here.
front_merges = I tend to turn this off, and it's on by default. I don't see the need for the scheduler to waste cpu cycles trying to front merge IO requests.
writes_starved = since deadline is geared toward reads the default here is to process 2 read batches before a write batch is processed. I found the default of 2 to be good for my workload.
Best Answer
This is an HP ProLiant server with a Smart Array P420i RAID controller. My immediate advice is to not change any of the default configuration settings unless you have a very specific reason to...
In short, don't worry about it.
The concept of sectors/tracks in the context of this controller and disk geometry isn't very useful here. Lots of layers of abstraction; useless unless you have a specific alignment issue to tackle or need a larger boot volume on a legacy OS. Enabling the Max Boot feature (>4GB boot volume) on the Logical Drive increases the sector count from 32 to 63.
HP's description of the option:
I wouldn't touch it.