So, here are a few points that may help you out.
- Provider: For most of the front end side of things, if cost is your main factor then it is up to you to find out what provider suits your needs. Reliability, Cost, and Scaling are all factors that you will need to consider.
- Note, unless you have the user download some kind of client side program (Flash, JS etc) your servers will have to receive the file and then upload it to S3 for them. This will induce a lot of load as well as bandwidth costs. However, it will also give you much better control over 'what' can be uploaded and how. Once you hand control over to the client you will not be able to truly control what gets uploaded.
- S3 is great for storing static content and it will be key in creating a site like this and keeping costs in line. Make sure you properly control who has upload permissions to which buckets. For example, if you have css and javascript in one bucket, only you should be able to upload to that location, otherwise a malicious user could upload some nasty files to replace your content. On the other side if you are going to allow the user to upload content directly to save on bandwidth, you will have to make sure that is a separate bucket, ideally per user. This is not trivial to enforce, and nearly impossible if you provide the client direct upload access.
Depending on your upload configuration (Client Side Client vs Server Side Client) your needs will be different. Client Side will be cheaper up front for server costs, but be aware that someone will probably find a way to store any kind of file and you will be responsible for moderating that content. For the Server Side model, be prepared to have your server costs increase with user traffic as you will need to build out more servers to handle upload requests.
Once you have the content hosted you will also want to look into a CDN (Content Delivery Network) such as Amazon's CloudFront (if you want to stay on the Amazon stack) or Akamai Networks. These will increase your costs at first, but save you money on high usage content.
Amazon SimpleDB is an interesting Database style. It is 'eventually consistent' which means that data sent to the database may not be immediately accessible, similar to Amazon S3. If you are going use the database as a way to keep data synced across multiple nodes for many realtime transactions, I would not recommend it.
The problems:
- The two servers you have listed above are absolutely identical.
- You talk about FusionIO but you also talk about running MySQL and Apache on the same box.
- You don't mention whether the Apache files or the MySQL database (or parts of it such as the
ib_logfile
) will be on the FusionIO drives.
The misconception:
It's not necessarily true that "real hardware will always be faster than virtual machines". It is true that on the same hardware the same application will perform better for not being in a virtual machine but since you don't have access to Amazon's hardware, that comparison is moot.
The point about the cloud is that it scales horizontally, so if you can serve 100 simultaneous visitors with one server, you can serve 1000 simultaneous visitors with 10 servers and each visitor receives the same speed of response, no matter how many of them you have.
The cloud:
There are a few key differences with cloud providers compared to colocation. If you are able to take advantage of them, they will make hosting in the cloud a clear winner.
- You can spin up and down instances very quickly. If your traffic is very bursty (say, you run a ticket sales website) then you can very easily clone your web tier, database tier and/or storage tier out to hundreds of virtual machines an hour before the Justin Bieber tickets go on sale and shut them all down an hour after to save on money. Hardware based solutions will usually take weeks to increase your capacity and they continue costing money when they aren't being fully utilised.
- The up-front cost can be much lower. The hardware you mention probably costs tens of thousands of dollars in addition to your other hosting costs. My Amazon server costs me about $15 per month and yet I could easily scale it up to a much more beefy virtual machine and scale it out to dozens of load-balanced instances with an hours notice.
- They do a lot of the work for you. Amazon have other services such as DynamoDB which automatically scale out or in to the workload or storage requirements you give it. They run in SSDs for speed and are replicated to multiple places giving you redundancy and availability.
That said, your application has to be capable of scaling horizontally. You can't simply throw it into the cloud and expect it to scale forever. For instance, default PHP sessions have two problems:
- They are stored on a local disk meaning you either need to use sticky sessions or a shared disk which will be a bottleneck.
- They are opened with
flock()
which is an exclusive, blocking file lock. Only one PHP process can be using a session file at a time. This can be a serious problem when you start firing off lots of AJAX calls.
This is only a single example but applications that have not been written with horizontal scaling in mind are usually full of exclusive resources like that one.
If you are running a distributed database (which Amazon's database services are) then your app also needs to be able to deal with the trade-offs inherent in the CAP theorem. This states that you can get two of the three aspects: Consistency, Availability, Partition tolerance. You will need to know which of the three you don't have and have your app compensate for it.
If your application suits hardware, go for hardware. If it suits the cloud, go for the cloud.
Note: I have used Amazon as an example here but there are other cloud hosting providers with similar capabilities of spinning up and down instances very quickly and only charging you for what you actually use.
Best Answer
I have not yet dealt with size of data that you are referring to. However, I did find this handy script and tweaked it to push my backup files to Amazon S3.
https://github.com/woxxy/MySQL-backup-to-Amazon-S3
If you have not already seen the recent announcement for Amazon Glacier, you should take a look. It may be more along the lines of what you need for pure backups.
http://aws.amazon.com/glacier/