Load Balancing Apache on a Budget – How To

apache-2.2high-availabilityload balancingredundancy

I am trying to get my head around the concept of load balancing to ensure availability and redundancy to keep users happy when things go wrong, rather than load balancing for the sake of offering blistering speed to millions of users.

We're on a budget and trying to stick to the stuff where there's plenty of knowledge available, so running Apache on Ubuntu VPS's seems like the strategy until some famous search engine acquire us (Saturday irony included, please note).

At least to me, it's a complete jungle of different solutions available. Apaches own mod_proxy & HAproxy are two that we found by a quick google search, but having zero experience of load balancing, I have no idea of what would be appropriate for our situation, or what we would look after while choosing a solution to solve our availability concerns.

What is the best option for us? What should we do to get availability high whilst staying inside our budgets?

Best Answer

The solution I use, and can be easily implemented with VPS, is the following:

  • DNS is round-robin'ed (sp?) to 6 different valid IP addresses.
  • I have 3 load balancers with identical configuration and using corosync/pacemaker to distribute the 6 ip adresses evenly (so each machine gets 2 adresses).
  • Each of the load balancers has a nginx + varnish configuration. Nginx deal with receiving the connections and doing rewrites and some static serving, and passing it back to Varnish that does the load balancing and caching.

This arch has the following advantages, on my biased opinion:

  1. corosync/pacemaker will redistribute the ip addresses in case one of the LB fails.
  2. nginx can be used to serve SSL, certain types of files directly from the filesystem or NFS without using the cache (big videos, audio or big files).
  3. Varnish is a very good load balancer supporting weight, backend health checking, and does a outstanding job as reverse proxy.
  4. In case of more LB's being needed to handle the traffic, just add more machines to the cluster and the IP addresses will be rebalanced between all the machines. You can even do it automatically (adding and removing load balancers). That's why I use 6 ips for 3 machines, to let some space for growth.

In your case, having physically separated VPSs is a good idea, but makes the ip sharing more difficult. The objective is having a fault resistant, redundant system, and some configurations for load balancing/HA end messing it up adding a single point of failure (like a single load balancer to receive all traffic).

I also know you asked about apache, but those days we have specific tools better suited to the job (like nginx and varnish). Leave apache to run the applications on the backend and serve it using other tools (not that apache can't do good load balancing or reverse proxying, it's just a question of offloading different parts of the job to more services so each part can do well it's share).