Sanity check on 40TB server configuration

sassoftware-raidzfs

I've got 40 years in computing, but I've never had to build a server quite like this one, so this might be a n00b question.

I have a client that is going to offer ultra-high def music files for download. In this case that means FLAC-compressed 24/192Khz =~ 10GB/album. (No, I don't want to discus the desirability of the product, just the server configuration.) The catalog will be about 3,000 albums, with both ultra-high and low def versions (for their iPods, I guess), giving about 35-40TB or so of primary data.

Since this is a very specialized product, the market size is relative small (think: people who spend $20,000+ on their audio systems), which means most of the time the server is going to be 100% idle (or close to it). I have what looks like a good colocation offer from ColocationAmerica with a 1Gbps connection and bandwidth at about $20/TB, so now I just have to build a box to deliver the goods.

The data-access use case is write-once / read-many, so I'm thinking of just using software RAID 1 for pairs of drives. This would allow me (I think) to reconfigure spare drives for failed ones on-the-fly, thereby being able to start the rebuild of the second drive before some sysadmin notices the red light on the system (they do free swap out). It would be great if I could get most of the drives to sleep/spin-down if they aren't needed, which will be most of the time for most of the drives.

I don't need much in the way of compute power—this thing is just shoving fat-objects down the pipe—and so the CPU/motherboard can be pretty modest so long as it can support this number of drives.

I'm currently considering the following configuration:

Chasis: Supermicro CSE-847E26-RJBOD1
Drives: 30 4TB SAS drives (Seagate ST4000NM0023 ?)
MB: SUPERMICRO MBD-X10SAE-O w/ 8GB
CPU: Xeon E3-1220V3 3.1GHz LGA 1150 80W Quad-Core Server

So, am I going in the right direction, or is this a completely n00b / dinosaur way of approaching the problem?

Update to clarify a couple of points:

  1. I have no experience with ZFS, since the last Sun product I owned was back in the late 80's. I will do a little RTFMing to see if it feels right.
  2. I don't really need the filesystem to do anything spectacular since the file names are going to be simple UUIDs, and the objects are going to be balanced across the drives (sort of like a large caching system). So I really was thinking of these as 40 separate filesystems, and that made RAID 1 sound about right (but I admit ignorance here).
  3. Because our current expectations are that we will be unlikely to be downloading more than a couple dozen files at any one time, and in most cases there will be exactly one person downloading any given file, I don't know if we need tons of memory for buffers. Maybe 8GB is a bit light, but I don't think 128GB will do anything more than consume energy.
  4. There are 2 separate machines not mentioned here: their current web store, and an almost completely decoupled Download Master that handles all authentication, new product ingest management, policy enforcement (after all, this is the RIAA's playground), ephemeral URL creation (and possibly handing downloads off to more than one of these beasts if the traffic exceeds our expectations), usage tracking, and report generation. That means this machine could almost be built using gerbils on Quaaludes.

ZFS? Where's the benefit?

OK, I'm slogging my way through multiple ZFS guides, FAQs, etc. Forgive me for sounding stupid, but I'm really trying to understand the benefit of using ZFS over my antediluvian notion of N RAID1 pairs. On this Best Practices page (from 2006), they even suggest not doing a 48 device ZFS, but 24 2-device-mirrors–sounds kind of like what I was talking about doing. Other pages mention the number of devices that have to be accessed in order to deliver 1 (one) ZFS block. Also, please remember, at 10GB per object, and at 80% disk utilization, I'm storing a grand total of 320 files per 4TB drive. My rebuild time with N RAID 1s, for any given drive failure, is a 4TB write from one device to another. How does ZFS make this better?

I'll admit to being a dinosaur, but disk is cheap, RAID 1 I understand, my file management needs are trivial, and ZFS on Linux (my preferred OS) is still kind of young. Maybe I'm too conservative, but when I'm looking at a production system, that's how I roll.

I do thank all of you for your comments that made me think about this. I'm still not completely decided and I may have to come back and ask some more n00b questions.

Best Answer

Based on your problem description your issue isn't so much the server as the storage.
You want a reliable, robust filesystem like ZFS that's designed to handle large storage capacity well, and has built-in management capabilities to make that end of the system easier to manage.

As was mentioned in the comments, I'd go with ZFS for the storage pool (probably on FreeBSD because I'm most familiar with that operating system and because it's got a long, proven track record of solid performance with ZFS - My second choice OS would be Illumos, again because of the well-tested ZFS support).


As far as serving up the files I agree - you don't need much in terms of hardware to just push data out the network port. Your primary driver for CPU/RAM is going to be the needs of the filesystem (ZFS).
The general rule of thumb is ZFS needs 1GB of RAM, plus 1GB for every 10TB of disk space it manages (so for 40TB you would need 5GB of RAM for ZFS) -- the relationship isn't quite linear though (there are plenty of good books/tutorials/docs on ZFS that can help you come up with an estimate for your environment).
Note that adding in ZFS bells and whistles like deduplication will require more RAM.

Obviously round RAM requirements up rather than down and don't be stingy: If your math says you need 5GB of RAM don't load the server with 8GB -- step up to 16GB.

You can then either run your server directly on the storage box (which means you're going to need even more RAM on that box to support the server processes), or you can remote-mount the storage to "front-end" servers to actually serve client requests.
(The former is cheaper initially, the latter scales better long-term.)


Beyond this advice the best suggestions I can give you are already well covered in our Capacity Planning series of questions -- basically "Load Test, Load Test, Load Test".