I have 6 WordPress websites running on 1 single EC2 instance. All the the websites are connecting to databases in 1 same RDS instance.
Earlier today, traffic to the largest website peaked and the RDS instance went bottle-neck – CPU utilization was 100% for over an hour. It affected all of my websites as it took them all forever to load.
In order to prevent such issue from happening again, which of the following will matter most so that I invest time and effort in first of all?
(I will work on all later, I just need to prioritise now)
- To improve caching for all websites
- To fine-tune the database server
- To fine-tune my Apache server
What will be the effect on user experience for my websites? Some quick searches show that I should limit number of concurrent connections to my web server but wouldn't that prevent users from accessing my websites?
More background:
- My largest website has 140k visits and 660k page views a month. The other 5 websites should add up much less than that.
- I'm using a large EC2 instance as the web server
- I'm using a medium RDS instance as the database server
What I've already done:
- Use W3 Total Cache plugin for caching for most the websites, especially the largest one (I can barely anything else in terms of caching I could do for the largest website)
Am I using my resources wastefully or is there simply not enough resources for my websites – or rather, how do I answer that question myself?
Best Answer
Running everything on a single EC2 instance defeats the whole point of a cloud-based deployment: the ability to autoscale and self-heal. As I've written before, autoscaling is the heart and soul of AWS and if you're not using it, you'd be better off using a traditional co-lo server or VPS. It would be both cheaper and more durable.
I just completed an AWS deployment that is very similar to your needs. The client runs three, fairly high-volume Wordpress sites (quite a bit more traffic than yours). The config looks like this:
This design does not have any single point-of-failure*, will automatically self-heal if an instance dies, and is smart enough to scale up or down according to resource demand. Total cost: about $200/mo. if you opt to purchase 1-year reserved instances.
I'm working on putting this configuration into a combination of a CloudFormation template and cloud-init/Python scripts that are automatically pulled from Github on boot. Basically it will allow anyone to pretty much push a button, wait for about an hour, and then come back and this whole environment will be waiting. I hope to have this completed by the end of the year. If you'd be interested in getting a copy of the template, send me an email to "jamie" at the website listed in my profile.
* The NAT instance is a SPoF, but that's primarily a design limitation of VPC. And it's a non-critical component that could fail and not affect the application.