Mysql – Ways to go about optimizing website performance WordPress, Amazon EC2 Apache and RDS MySQL

amazon ec2amazon-rdsMySQLwebsiteWordpress

I have 6 WordPress websites running on 1 single EC2 instance. All the the websites are connecting to databases in 1 same RDS instance.

Earlier today, traffic to the largest website peaked and the RDS instance went bottle-neck – CPU utilization was 100% for over an hour. It affected all of my websites as it took them all forever to load.

In order to prevent such issue from happening again, which of the following will matter most so that I invest time and effort in first of all?
(I will work on all later, I just need to prioritise now)

  • To improve caching for all websites
  • To fine-tune the database server
  • To fine-tune my Apache server

What will be the effect on user experience for my websites? Some quick searches show that I should limit number of concurrent connections to my web server but wouldn't that prevent users from accessing my websites?

More background:

  • My largest website has 140k visits and 660k page views a month. The other 5 websites should add up much less than that.
  • I'm using a large EC2 instance as the web server
  • I'm using a medium RDS instance as the database server

What I've already done:

  • Use W3 Total Cache plugin for caching for most the websites, especially the largest one (I can barely anything else in terms of caching I could do for the largest website)

Am I using my resources wastefully or is there simply not enough resources for my websites – or rather, how do I answer that question myself?

Best Answer

Running everything on a single EC2 instance defeats the whole point of a cloud-based deployment: the ability to autoscale and self-heal. As I've written before, autoscaling is the heart and soul of AWS and if you're not using it, you'd be better off using a traditional co-lo server or VPS. It would be both cheaper and more durable.

I just completed an AWS deployment that is very similar to your needs. The client runs three, fairly high-volume Wordpress sites (quite a bit more traffic than yours). The config looks like this:

  • Everything in a VPC for additional security. VPC contains six subnets across two availability zones (AZ)
  • An Elastic Load Balancer that spans two subnets/AZs
  • An autoscaling (AS) group of m1.small instances serve as the application tier. There are a min of two app servers and a max of 10, depending on the traffic load. In normal operation, it runs with just two and average CPU utilization is consistently under 15%. The AS group scales up and down with two instances at a time, one in each AZ.
  • Each app instance runs Nginx + PHP-FPM + APC. In front of this stack I have Varnish installed to provide additional caching.
  • A small, multi-AZ RDS instance in the two, private subnets. Even under extremely high traffic load, it's barely touched due to the amount of application-side caching.
  • Static assets are served off of Cloudfront to reduce the load on the application servers even further.
  • Files are stored on a pair of mirroring Gluster nodes, each in a separate AZ for HA purposes. A floating elastic IP address is assigned to one of the nodes, which gives you SSH or SFTP access into the actual Wordpress files. Nodes are part of it's own AS group, so if the fileserver were to die, it would be automatically killed and recreated. On boot, it reattaches the volume containing the Wordpress files. Automatic backups occur via a series of hourly volume snapshots.
  • A t1.micro instance serves as a NAT gateway for instances in the private subnets.

This design does not have any single point-of-failure*, will automatically self-heal if an instance dies, and is smart enough to scale up or down according to resource demand. Total cost: about $200/mo. if you opt to purchase 1-year reserved instances.

I'm working on putting this configuration into a combination of a CloudFormation template and cloud-init/Python scripts that are automatically pulled from Github on boot. Basically it will allow anyone to pretty much push a button, wait for about an hour, and then come back and this whole environment will be waiting. I hope to have this completed by the end of the year. If you'd be interested in getting a copy of the template, send me an email to "jamie" at the website listed in my profile.

* The NAT instance is a SPoF, but that's primarily a design limitation of VPC. And it's a non-critical component that could fail and not affect the application.

Related Topic