Firstly, some raw data, taken from S. Ostermann, et al, 2010:
Basic instance specs:
+-----------+---------+------+-------+-------+------+-------+---------------+---------------+
| Name | ECUs | RAM | Archi | I/O | Disk | Cost | Reserve | Reserved Cost |
| | (Cores) | [GB] | [bit] | Perf. | [GB] | [$/h] | [$/y], [$/3y] | [$/h] |
+-----------+---------+------+-------+-------+------+-------+---------------+---------------+
| m1.small | 1 (1) | 1.7 | 32 | Med | 160 | 0.1 | 325, 500 | 0.03 |
| m1.large | 4 (2) | 7.5 | 64 | High | 850 | 0.4 | 1300, 200 | 0.12 |
| m1.xlarge | 8 (4) | 15 | 64 | High | 1690 | 0.8 | 2600, 4000 | 0.24 |
| c1.medium | 5 (2) | 1.7 | 32 | Med | 350 | 0.2 | 650, 1000 | 0.06 |
| c1.xlarge | 20 (8) | 7 | 64 | High | 1690 | 0.8 | 2600, 4000 | 0.24 |
+-----------+---------+------+-------+-------+------+-------+---------------+---------------+
Basic performance/cost analysis:
+---------------+------------+----------+--------+-----------+---------+--------+-----------+----------+
| System | Peak Perf. | HPL | STREAM | RandomAc. | Latency | Bandw. | GFLOP/ECU | GFLOPS/$ |
| | [GFLOPS] | [GFLOPS] | [GBps] | [MUPs] | [µs] | [GBps] | | |
+---------------+------------+----------+--------+-----------+---------+--------+-----------+----------+
| m1.small | 4.4 | 1.96 | 3.49 | 11.6 | - | - | 1.96 | 19.6 |
| m1.large | 17.6 | 7.15 | 2.38 | 54.35 | 20.48 | 0.7 | 1.79 | 17.9 |
| m1.xlarge | 35.2 | 11.38 | 3.47 | 168.64 | 17.87 | 0.92 | 1.42 | 14.2 |
| c1.medium | 22 | 3.91 | 3.84 | 46.73 | 13.92 | 2.07 | 0.78 | 19.6 |
| c1.xlarge | 88 | 51.58 | 15.65 | 249.66 | 14.19 | 1.49 | 2.58 | 64.5 |
| 16x m1.small | 70.4 | 27.8 | 11.95 | 77.83 | 68.24 | 0.1 | 1.74 | 17.4 |
| 16x c1.xlarge | 1408 | 425.82 | 16.38 | 207.06 | 45.2 | 0.75 | 1.33 | 33.3 |
+---------------+------------+----------+--------+-----------+---------+--------+-----------+----------+
Actual performance is usually under 50% of the theoretical performance. The one set of values that might be suspect are those for c1.medium, which don't quite agree with the expected results (e.g. bandwidth).
The primary cost to EC2 for a typical workload is the cost of instances - other costs (bandwidth, provisioned storage, etc) are typically under 25% of the total cost. One doesn't expect performance to scale perfectly - and that is evident from the data above. Especially with regard to horizontal scaling, it seems that as you add more compute capacity, the efficiency drops off significantly.
Given the above, and keeping in mind that there are other factors beyond raw compute performance (e.g. I/O performance, memory, etc) it stands to reason that vertical scaling is the most economical approach.
Unfortunately, there are other considerations beyond simply the economics of the scenario. Reliability being a key. With a single instance, the failure of that instance takes down your entire setup. One possible solution may be auto-scaling (i.e. maintaining an instance count of 1), however a single instance is still prone to problems that may occur in a given availability zone, etc.
At some point it is necessary to scale horizontally - the question simply becomes one of when is the ideal time. I would probably suggest:
- Scaling vertically at least a few instance sizes (much more so if you start with an t1.micro)
- Separate your databases to separate instances (because they don't scale the same way as your web servers)
- Scale horizontally until you have a bit of redundancy
- Scale vertically until you reach the maximum instance size
- Scale horizontally thereafter (possibly using smaller instances initially)
Getting back to the questions at hand - running a single website per instance (or per set of instances) will always be more expensive. In addition to the fixed costs being higher (e.g. one load balancer per website, instead of just a single load balancer), you will not make use of your instances as efficiently (i.e. one website may see high load at a time when other websites are mostly idle - which means that you have some instances overloaded, and others sitting idle). In terms of logistics, the problem might not be as bad as one would imagine - the main problem comes down to managing everything independantly (which you might avoid with some configuration management tools (e.g. Puppet/Chef), but that is usually not a step taken until your setup gets to be a bit larger).
On the other hand, one of the limitations of EC2 instances is that you can only assign a single public IP address to a given instance (which has some implications for certain SSL setups).
You can certainly generate your own AMIs - it is fairly standard practise actually. I usually start with Amazon's Linux AMI as I find it to be one with the least overhead (quite easy on resources, and fast) and the best supported by AWS (it is regularly updated, etc) - that and I prefer the RHEL/CentOS distributions (on which Amazon's Linux is based) to the Debian/Ubuntu ones that are the other popular choice. Once you have customized an instance, you can take snapshots of your EBS volume(s) and register an AMI - passing the snapshot ID as the image on which to base the root volume. In theory you can customize your operating system much more, even to the extent of building your own distribution (but still using the Amazon kernels) - however, unless you have a very specific use case, that is unlikely to be particularly beneficial. My personal preference for running Wordpress, is Varnish + Nginx + PHP-FPM (and W3TC for Wordpress) - I find it is much easier on resources than the typical LAMP stack.
Finally, to address scaling once more. Beyond the basic economics of the problem discussed above, the difficulty comes down to making multiple instances 'appear' as one. This includes ensuring that every instance will serve the same data, load balancing between your instances, and perhaps handling details like PHP sessions. It will be more difficult to do if every site runs on its own set of instances - but likely not by a significant margin (since you will have configured the functionality into your AMI, hopefully). Multiple instances, does however, mean a more complex system, and more things to keep an eye on. (There are quite a few questions on ServerFault on this topic, such as this, this, or this - if you need details on how to scale a specific setup, please ask it as another question).
As a concluding comment - unless your setup has particular needs for an individual site to run on its own instance/cluster (e.g. a vastly different configuration/requirements), I would favour running multiple sites on a single instance/cluster as it is simpler to scale, more economic and efficient, and is more aligned with the spirit of 'cloud computing' (i.e. shared resources).
References:
Best Answer
I think all those answers about "we don't know" are lazy. Because you do know what types of operations you might have in a DB and you can at least provide some information about those types of operations.
For example, we ran the tests for T3.medium vs T3.small vs T4g.medium vs T4g.small AWS RDS PostgreSQL instances and found out that all of them were very similar when it came to the speed of write/read operations when you're performing an operation that is supposed to take a short time ~ 1-5 seconds.
However, where the .medium instances (both T3 and T4) really stood out was the creation of indexes on a large table and text search through a large table. In that case both T3 and T4 .medium instances were about 2x faster than the .small instances. T3 was a bit better in our tests (like 5-10%).
So I'd say go with T3 and choose the RAM size depending on the largest table in your DB.