At what point is it worth adding a CDN (content delivery network) to your website?
When one of the following occurs:
- You're reaching a large, international audience. Careful analysis of your audience shows that many of them are 100 - 300ms Round Trip Time (RTT) away. You do the math, and discover that a large group of you customers are getting a somewhat slow site, due to TCP/IP's so-so performance on links with high bandwidth delay product.
- You find that you have a lots of requests for mostly static files, i.e. streaming video, audio, PDFs, images etc. In fact, there are so many requests per second that it can't easily be handled by just setting up 2, 3, 4 or more servers dedicated to static file serving.
- You're a tech geek, and you set up a site using Amazon Cloudfront or Cachefly just for the fun of it. Don't feel bad, I have done it too.
I have repeatedly seen articles where SimpleCDN didn't do so great. It is really hard to objectively quantify the performance of the various CDNs, but here is one attempt. Maybe I'm being unfair to SimpleCDN here, but they wouldn't be my first choice.
Amazon Cloudfront is pretty consistenly good ... not great, but cheap and easy to get started with.
Edit: Akamai still seems to be the very best CDN, expensive but so worth it. See SmugMugs recent presentation, slide 7 in the PDF or the more detailed version in the video. I have never worked with Akamai, I have always dismissed them as obviously too expensive for the sites I have worked on. Maybe that is beginning to change, I don't know, but they are trying to lower the barrier to entry to their CDN service.
This answer has been edited after the question was clarified.
What are other reasons effects clouds to prefer DAS
Where "DAS" means Direct Attached Storage, i.e. SATA or SAS harddisk drives.
Cloud vendors all use DAS because it offers order-of-magnitude improvements in price/performance. It is a case of scaling horizontally.
In short, SATA harddisk drives and SATA controllers are cheap commodities. They are mass-market products, and are priced very low. By building a large cluster of cheap PCs with cheap SATA drives, Google, Amazon and others obtain vast capacity at a very low price point. They then add their own software layer on top. Their software does multi-server replication for performance and reliability, monitoring, re-balancing replication after hardware failure, and other things.
You could take a look at MogileFS as a simpler representative of the kind of software that Google, Amazon and others use for storage. It's a different implementation of course, but it shares many of the same design goals and solutions as the large-scale systems. If you want to, here is a jumping point for learning more about GoogleFS.
stated later in the paper, Clouds should use SAN or NAS because of DAS is not appropriate when a VM moves to another server
There are 2 reasons why SAN's are not used.
1) Price.
SAN's are hugely expensive at large scale. While they may be the technically "best" solution, they are typically not used at very large scale installations due to the cost.
2) The CAP Theorem
Eric Brewer's CAP theorem shows that at very large scale you cannot maintain strong consistency while keeping acceptable reliability, fault tolerance, and performance. SAN's are an attempt at making strong consistency in hardware. That may work nicely for a 5.000 server installation, but it has never been proved to work for Google's 250.000+ servers.
Result:
So far the cloud computing vendors have chosen to push the complexity of maintaining server state to the application developer. Current cloud offerings do not provide consistent state for each virtual machine. Application servers (virtual machines) may crash and their local data be lost at any time.
Each vendor then has their own implementation of persistent storage, which you're supposed to use for important data. Amazon's offerings are nice examples; MySQL, SimpleDB, and Simple Storage Service. These offerings themselves reflect the CAP theorem -- the MySQL instance has strong consistency, but limited scalability. SimpleDB and S3 scale fantastically, but are only eventually consistent.
Best Answer
Edit - I'll give you a concrete example. Amazon S3 says that your data will be replicated, I assume that means automatic failover if they lose an active site. (Although I would read the details of my SLA if I were buying from them.) They also say that you have to pick your region. So, for Amazon S3, the answers are (1) Yes and (2) No.
But that's just for Amazon S3. Google may have different answers. Azure may have different answers. Rackspace may have different answers. Any of them may have multiple answers based on your checking account. There is no single answer for the broad category of "cloud storage.*
Further edit - Amazon has a beta service called CloudFront, that offers a "Yes" to your second question.