Magento – Poor Magento Performance with AWS EC2

awsperformance

We are in the process of deploying our Magento shop to AWS EC2 but are seeing some serious performance degradation. Our development server has the same set of the Magento code as the production server.

Here are the details for the development server:

Ubuntu Server 12.04
4 virtual CPU's (the host is ESXi 5.5) (physical CPU is i7-4790 @ 3.6GHz)
4Gb memory
MySQL 5.5 on the same VM
Redis cache enabled for backend and session (on a different VM)
APC cache enabled
local network

Using the firefox Network monitor, the waiting time to access a product page is approx 500ms.

For the production server(s):

Ubuntu Server 12.04 HVM
frontend and admin are on different hosts, both have instance type m3.medium (1 virtual CPU, 3.75Gb memory, physical cpu is probably Xeon E5-2670 v2 @ 2.5GHz)
code is mounted from admin to frontend using NFSv4, with these options: rw,insecure,no_subtree_check,async
MySQL 5.5 on Amazon RDS in the same region, same subnet but different availability zone (instance type db.m3.medium, 1 virtual CPU, 3.75Gb memory, query_cache_size is set to 16M... for some reason the default was set to 0)
Redis cache enabled for backend and session (on a different EC2 instance)
APC cache enabled
AWS region is US East (latency from our office to AWS US East is about 250ms)

Using the firefox Network monitor, the waiting time to access a product page is approx 2000ms.

We tried using Magento in the local filesystem instead of NFS, but it doesn't seem to reduce the waiting time.

We tried accessing a page that doesn't hit the database (such as a Magento 404 page), and the waiting time in AWS EC2 is approximately 4 times longer than our development server.

We are not sure if moving from m3.medium to say, a c3.xlarge instance type will make any difference.

Can anyone provide any insights?

Update

We just created a new c3.xlarge instance (4 virtual CPU's and 7.5Gb memory) and put Magento there. The waiting time reduces to approximately 1500ms but this is still nowhere near 500ms on our development server.

Best Answer

You are comparing dedicated hardware running on bare metal (presumably locally) to shared, highly contended hardware virtualised behind a hypervisor.

CPU dictates load time, of which your AWS lottery has served you something 30% slower than your dev. box. Its no surprise a cloud service would be slower, its widely regarded for its poor performance, its strength is merely size - you can spin up 600 slow servers at a whim. If you want good performance, drop "the cloud".

But it you are determined to stay with a cloud service, then start with the basics.

  1. Begin with connectivity. Before trying to ascertain any application layer bottlenecks, understand your test conditions first.

    You should measure TCP latency between your development server and test client and live server and test client. I'd suggest tcpping for a L4 test.

    Or simpler still, just test from the respective machine. It rules out network and allows you to narrow the next bottleneck.

  2. Move on to CPU and I/O. Your delays are going to be caused by either CPU contention or I/O contention. As its a VPS, this is mostly emulated, which will add considerable overhead, and because its shared, it is also contended.

    Test against a simple local static resource over HTTP, then repeat the same with a resource locally mounted in RAM. Then test the tmpfs and physical partition with something like dd. This will allow you to compare disk perfromance/throughput/seek with that of RAM - and you can measure if either have any outliers.

    Note that there's a very strong likelihood that your test results are going to differ each time you test, because its shared. You can't account for what other users on the same hypervisor might be doing, what context switching might be occuring or what peculiarities may exist in the VM emulation itself.

Testing needs to be performed across the stack to find your bottleneck. Don't even consider testing the application layer (ie. Magento) until you've got consistent even results for CPU, RAM disk and network I/O.

Related Topic