Linux Cloud Cluster Solutions for Scalable Web Services

cloud computingclusterlinuxscalabilityunix

I'm going to build a high-performance web service. It should use a database (or any other storage system), some processing language (either scripting or not), and a web-server daemon. The system should be distributed to a large amount of servers so the service runs fast and reliable.

It should replicate data to achieve reliability and at the same time it must provide distributed computing features in order to process large amounts of data (primarily, queries on large databases that won't survive being executed on a single server with a suitable level of responsiveness). Caching techniques are out of the subject.

Which cluster/cloud solutions I should take for the consideration?

There are plenty of Single-System-Image (SSI), clustering file systems (can be a part of the design), projects like Hadoop, BigTable clones, and many others. Each has its pros and cons, and "about" page always says the solution is great 🙂 If you've tried to deploy something that addresses the subject – share your experience!

UPD: It's not a file hosting and not a game, but something rather interactive. You can take ServerFault as an example of a web-service: small pieces of data, semi-static content, intensive database operations.


For those who might be interested:

Cross-Post on StackOverflow

Related questions:

Best Answer

Facebook is using cassandra for data storage.

Here is article about scaling youtube and google architecture and prestentation: Designs, Lessons and Advice from Building Large Distributed Systems by Jeff Dean of Google describing how they do their thing.