Centos – How to scale beyond 150 page views per minute

centosfastcgilighttpdPHPscalability

I have a Facebook app written in PHP. It has 150 page views per minute and will have up to 300 page views per minute till end of this year. While getting more PV I start to have problems with scalability and therefore I would like to ask you for an advice how to scale to sucessfully handle 300 PV / minute.

My application is a quizz like app, it is hosted on a VPS that can use:

  • 100% of one core 2,6 GHz processor
  • 500 MB, up to 2 GB of RAM (cat /proc/user_beancounters said that I have really privvmpages = 500 MB, free -m shows 2 GB)

Configuration of my VPS goes like this:

  • Centos 5
  • Lighttpd
  • Memcached
  • APC
  • MySQL
  • PHP using FastCGI

While last months I've acomplished to optimize MySQL, Lighttpd and PHP configuration using some tutorials provided on the internet. I've managed to use extensively Memcached so many requests dropped to 1ms, and those not handled by memcache take up to 300 ms. I've added good indexes to MySQL so it's not on the fire range of users.

For some time above optimizations was enough to handle new requests, but lately due to increasing popularity of application, I've noticed that some requests take longer than 3 seconds and in critical burst my Lighttpd just says f*** you and users get Internal Server Error 500.

I've managed to find (I'll know this today for sure) a solution to fix error 500 by setting:

"PHP_FCGI_MAX_REQUESTS" => "500"

But still scalability issue is not resolved. I need to be able to handle 2 x more requests than now. And I think how to do this. Here are solutions that I came up today:

  1. Upgrage VPS to 3,3 GHz on 2 cores
  2. Buy another VPS and move database there
  3. Ask someone for help (that I do now)

I can buy at my VPS distributor a bigger plan that has 3,3 Ghz in place of 2,6 Ghz that I have now and on 2 cores not one. It will take some more money, but can it help me? How to calculate if it will handle 300 PV?

Second idea I have is to buy another VPS and move database there. It should give gain of CPU and Memory for FastCGI processes and Database process. But how to know if it's better to spawn another server or to buy bigger plan for this I have now?

So I come into 3 point – to ask someone. So here I am – a programmer, not a administrator, with a very huge scalability problem and ask for your help.

I would like to know how can I calculate how many PV per minute my current VPS can handle
– it would help me decide. Because if 300 PV is beyond my current VPS abilities – I can think right away on other solution and not messing more with configuration.

Secondly – if it's possible that my VPS can handle more requests – it's issue of configuration – than I would need some help from someone with more knowledge in this issue to help me set up config right. I can provide this config here or send someone by email and hope to know from you who has time and knowledge to help me with this. I don't have time for more experiments in this matter.

Lastly – if it is beyond of my VPS abilities I would like to know from you how to decide if I should upgrade my VPS or spawn another server? What solution will be better for 300 PV target?

If you came to this point of my questions thank you very much in advance for asking. Your help, advice or contacts to persons who can help in in this issue will be very helfull for me!

Best Answer

The killer bottleneck for reasonably specced VPSs is usually disk I/O as all the VMs running on a given host will be sharing the same disk (or array of disks - good VPS hosts will have your VMs on a RAID10 array or similar), in fact sometimes several hosts worth of VMs will share the same array if they are setup with an extrernal drive array. This is particualrly obvious when memory becomes short as your database queries will always be hitting disk due to having no RAM to cache even a core working-set of the data.

You might find that getting your own low-spec dedicated server would improve matters simply because your needs can monopolise the raw I/O bandwidth and you'll see less I/O latency as the drive heads are only flipping back and forth for your I/O requests not several other machines worth of I/O requests too. This might even end up costing less than the "run two VPSs" solution, particularly when you consider than in many cases data transfer between VMs will count against your badwidth quotas for the machines (check with your host - this is not always the case but unless you are exlicitly told it isn't it is safer to assume it is) so you may have increased bandwidth related costs. You might be surprised how little ou can rent a small P4 based machine for, and from your description I doubt CPU power is your bottleneck (memory and I/O contention are the more likely culprits).

500Mb of memory may be a limitation, so going back to the two VPSs idea splitting off to two VMs so your datbase isn't competing with your FastCGI and memcached processess may help. Similarly, it might just be worth getting more fixed RAM allocated - I've never hd any faith in the idea of "burstable RAM allocation" as I assume each OS will try use as much RAM as it can for I/O efficiency (though I've never used a host that uses burstable RAM allocation so ave no direct evidence to back up muy lack of faith!). What does the rest of free -m show? Also, what sort of size are your databases? Getting more fixed RAM allocated may help more than moving to cheap dedicated server (as most of the cheaper options come with only 512Mb physical RAM, though most can also be upgraded for extra cost) depending on how cramped 512Mb actually is for your needs.

Sorry that is isn't a particularly straight answer...

To test how RAM dependant your performance is you could setup a VM of similar spec on your local machine, duplicate your setup in that, and throw some benchmarking software at it (http://httpd.apache.org/docs/1.3/programs/ab.html is a place to start) then increase the RAM allocated to the VM to see what difference it makes to where errors start to kick in. You can simulate bad I/O contention too by running a couple of other simple VMs each performing some sort of I/O benchmark like bonie++.

Related Topic