Given the following parameters, how can I estimate disk subsystem requirements?
The environment is a public webserver for delivery of large files to many concurrent users.
- Total file set size (in GB)
- Total file set size (# files)
- Working file set size (in GB)
- Working file set size (# files)
- Average file size
- Average client download speed
- Expected concurrent clients
Despite the fact these are large files, I imagine I should estimate for somewhere between pure random IO and pure sequential IO. With faster/fewer clients I guess it will tend towards sequential, and with slower/more clients it would tend towards random. Hopefully that's kinda correct?
So my thinking is to first calculate "expected IOPS." This is what I'm stuck on. I'm assuming I should be able to get close using the following parameters: Working set size, Average Client Speed and Expected Concurrent Clients.
From there, I can look at the IOPS ratings of disks and RAID controllers and come up with a rough estimate of the disk subsystem required to serve the file set to that many users.
Obviously there is more to it such as read-ahead and the amount of RAM available for caching, as well as filesystem blocksize, RAID stripe-width, etc. but I figure if I base it off 0 read-ahead and 0 RAM, that should give me a rough pessimistic estimate.
Can someone with experience in this field please let me know if I'm on the right track, and/or offer any advice on how to calculate some of these values?
If there are sites that discuss this or books I can buy, I'm very willing to do so but I've been searching for 2 days with not very much luck. I'm a bit out of my depth when it comes to storage.
I also understand I'll have to benchmark to get a proper answer, but I'd like to do as much estimation as I possibly can first.
All help appreciated, flames welcomed!