Theorically, GlusterFS is an answer to your need.
Using GlusterFS, you can easily create RAID0 (type cluster/distribute) and RAID1-like volumes (type cluster/replicate), distributed across many machines.
GlusterFS architecture lets you stack translators in a way you can create 2 distributed volumes, replicate them, and then, access your distributed/replicated data through an unique mount point.
However, there are some feedbacks of bugs appearing when users stack such translators (see GlusterFS mailing lists).
That's why I don't trust GlusterFS enough to setup RAID10-like volumes. (Since I don't tested this setup enough, it's only a belief)
Of course, simple RAID0-like and RAID1-like volumes seem production-ready.
Would Gluster help me with I/O limits of harddrives, if I chained a
lot of machines together?
If it would, it would replace that with.... network limits. I mean, seriously - if you distribute IO over 50 machines, your network needs to handle that. If you run into IO limits, not just storage area limits (i.e. terabyte) the proper solution is rather to invest into an IO focused solution. It is not like they do not exist.
Can "Gluster" serve FLV files without
any front-end server layers just using
the built-in HTTP protocol ?
No file protocol serves over HTTP ;)
I see Gluster like a mid size / storage oriented system.Yes, you get space and that without a lot of administration, and yes, you can use cheap machines - like putting big hard drives into every workstation and use them as file server type of thing.
But if you get LARGE (youtube) you may still need dedicated systems (even if they run Gluster), and if you run a LOT of IO, then a SAN style of infrastructure with content cluster serving front ends (i.e. partitioning the files so that not every server has to cache everything) is logically the only solution. Gluster is smart, it can not do magic. Caching on the actual serving server can be overloaded.
Moving IO from the discs to the network results in a LOT of very high requirements for the network. You may want to read up on Infiniband - it is actually mentioned in the Gluster documents. But that said, what is the problem with IO limits, if a cheap (USD 1000 range) Raid controller can handle about 200 SATA discs? You do not need a lot of machines (at higher costs) to get around IO limits. That will always be more expensive than dedicated boxes (as obviously you have to pay for more CPU, more RAM etc.). And you wont save any discs with Gluster - having fully redundant data storage is needed in both cases, with and without Gluster.
That said, using front end servers is advisable for a LOT of reasons. Even if Gluster can handle without, in a large installation, it would be utterly stupid to not use separated front end servers, preferably with some firewall in between.
Best Answer
The scalability issues in Gluster are related to the number of bricks, not the number of servers. Gluster, in general, scales linearly on common I/O patterns. The exception to this rule is file create operations and management operations. Both cause Gluster to take a hit because of the overhead on the network increases as the cluster grows.
There are a couple of areas to look at to determine what to do to increase performance. First, take a look at "iostats -dkx 30" and "iptraf" on the server. If the util% is on the high end, or the network bandwidth close to maturing the interface. Adding clients won't help. You only option would be to add a server or give it add a network cared or replace the network card with one that can carry more bandwidth. The other option here is to add more iocache space on the clients.
Disk usage% will also fall if you increase the amount of RAM available as Linux likes to cache the file system. The next potential bottle neck is the disk itself. Run top and or isostat 5 to check the level of iowait. If its high faster disks or more disks may help.
Check your clients for charachteristics inhibiting the servers throughput, for example: CPU usage, network usage, memory usage, etc.