There's a whole series of articles on this topic @ http://highscalability.com
I haven't used AWS, but I have experience with running virtual instances in a datacenter, using rackspace virtual instances, and appengine.
How you scale (up vs. out) is greatly determined by what it is you're trying to do. Some apps will be i/o intensive, some will be cpu intensive. Your bottleneck might be inbound i/o, processing power, or backend i/o, or a combination of the three in varying amounts depending on where you are in your app's lifecycle. All will require a slightly different strategy.
Using something like AWS, in general you want to scale out and you have to begin with the end in mind and keep your apps loosely coupled. This will allow you to throw up another instance to scale to demand. It's fine to keep your db instance on the same instance as your main app when you're starting out, but that's usually the first thing to get spun off onto it's own server.
So you might start out with everything running on one instance. Then you start to get some traffic, and notice the database is eating up your cpu. So you move the database to another instance, and everything is great. Until you start to get more traffic... and you notice your front-end can't keep up with the traffic. So then you fire up a couple more instances, load balance them, and you're happy for a while, and scale up to maybe a dozen web servers... But then you get some more traffic, and while the front end is keeping up, now your database machine is starting to thrash. So then you replicate your database to a master and a couple of slaves, and everything is fine... and so on and so forth.
On scaling up, the EBS volume and its data will not be "cloned". To have this behavior you'd want to automate it at boot.
- Grab the latest snapshot of WS-1 EBS volume
- Create and attach the volume
Another method, depending on how much data is on the EBS, is to pull it down from S3.
With the security group, you can allow any server in the app_security_group to have access to any server in the nfs_server_group. This will allow you to dynamically update the security groups.
Hope that makes sense.
Best Answer
First off I would not recommend using S3 as your shared filesytem. It can be extremely costly due to how the IO works.
There's a couple ways to do this.
The easiest way is to add your hosts EC2 security group as a source IP rule for the security group of the NFS server while blocking all other unnecessary traffic. This usually means having two rules; one for your management of the NFS server over SSH [typically] and the other rule allowing all traffic from your connecting hosts to the group. This allows only that traffic to the NFS host. At that point you can safely set the permissions in NFS to 10.0.0.0/8 or just leave it open to everyone. No connections to NFS will be allowed except those in your security group settings.
Alternately you can set up a startup script on the connecting hosts that either remotely configures the NFS host or pings it somehow so that it knows how to configure itself. This way you can leave the NFS settings to a per-IP setup rather than being more open.
Update June 2015:
Elastic File System is coming and would be a much better solution. http://aws.amazon.com/efs/