Smaller cluser size means that a file will be distributed between more clusters (obvious). This means potentially more fragmentation and possibly more lookups to find the clusters. It is the usual speed vs size optimisation. As the hard disks are cheap, I would go for larger cluster sizes, but anyway, you will probably not see that much difference ...
I'm not a distributed file system ninja, but after consolidating as many drives I can into as few machines as I can, I would try using iSCSI to connect the bulk of the machines to one main machine. There I could consolidate things into hopefully a fault tolerant storage. Preferably, fault tolerant within a machine (if a drive goes out) and among machines (if a whole machine is power off).
Personally I like ZFS. In this case, the build in compression, dedupe and fault tolerance would be helpful. However, I'm sure there are many other ways to compress the data while making it fault tolerant.
Wish I had a real turnkey distributed file solution to recommend, I know this is really kludgey but I hope it points you in the right direction.
Edit: I am still new to ZFS and setting up iSCSI, but recalled seeing a video from Sun in Germany where they were showing the fault tolerance of ZFS. They connected three USB hubs to a computer and put four flash drives in each hub. Then to prevent any one hub from taking the storage pool down they made a RAIDz volume consisting of one flash drive from each hub. Then they stripe the four ZFS RAIDz volumes together. That way only four flash drive were used for parity. Next of course the unplugged one hub and that degraded every zpool, but all the data was available. In this configuration up to four drive could be lost, but only if any two drive were not in the same pool.
If this configuration was used with the raw drive of each box, then that would preserve more drives for data and not for parity. I heard FreeNAS can (or was going to be able to) share drives in a "raw" manner via iSCSI, so I presume Linux can do the same. As I said, I'm still learning, but this alternate method would be less wasteful from drive parity stand point than my previous suggestion. Of course, it would rely on using ZFS which I don't know if would be acceptable. I know it is usually best to stick to what you know if you are going to have to build/maintain/repair something, unless this is a learning experience.
Hope this is better.
Edit: Did some digging and found the video I spoke about. The part where they explain spreading the USB flash drive over the hubs starts at 2m10s. The video is to demo their storage server "Thumper" (X4500) and how to spread the disks across controllers so if you have a hard disk controller failure your data will still be good. (Personally I think this is just a video of geeks having fun. I wish I had a Thumper box myself, but my wife wouldn't like me running a pallet jack through the house. :D That is one big box.)
Edit: I remembered comming across a distributed file system called OpenAFS. I hadn't tried it, I had only read some about it. Perhaps other know how it handles in the real world.
Best Answer
ZFS-on-Linux × feature called "on-line deduplication".
UPD.: I've re-read your question once again now it looks like Aufs can be of help for you. It's very popular solution for hosting environments. And actually I can mention Btrfs by myself now as well — the pattern is you have some template sub-volume which you snapshot every time you need another instance. It's COW, so only changed file blocks would need more space. But keep in mind, Btrfs is, ergh… well, not too stable anyways. I'd use it in production only if data on it are absolutely okay to be gone.