Tomcat – Load Balancing not working as expected in Apache in round-robin mode

apache-2.2load balancingtomcat

We are facing some unexpected behavior with round-robin load balancing on Apache when one of the Tomcat servers goes down.

Our Setup: we have 2 Apache web servers on the front end using mod_jk module for load balancing with round-robin load distribution. We have enabled session stickyness. The load is balance amongst 4 Tomcat servers on which the applications are running.

Sometimes under heavy load, if there is a slowness in our database tier eventually one of the Tomcat servers goes into a hung state and would need a restart. The moment we bounce the Tomcat server we see a spike in requests in one of the other Tomcat servers which then would also go into hung state and need a restart.

Eventually all the Tomcat servers hang in a similar fashion.

Why does the Apache transfer the whole load to one server instead of distributing the load?

We are now trying the worker.balancer.method=B to see if this helps to resolve our issue.

In the Images below we see that service threads shoot up,

  • in Server 1 when Server 4 goes down at about 11.50
  • in Server 2 when Server 1 goes down at about 11.55

enter image description here
enter image description here
enter image description here

Best Answer

(Posting an answer instead of a comment as it might be too long) :

I'm not saying "F5 can handle the issue in a better way" but :

  • I would prefer doing the load balancing task by load balancers : F5 big ip, among other products, where designed to do this job.
  • As you have a small setup (4 tomcat), I see no reason for now to have 2 levels of load balancing. Having only F5 checking that a custom jsp page returns 200 is imho much simpler.
  • The worst downside I remember are : When a node is down, some traffic is still directed to it until next health check (~ 5 sec by default iirc). Session are lost if a node is down (maybe tomcat offers the ability to have a workaround eg session in database...).

I don't think it's easy to find "neutral" public benchmarking/testing about the better equipement/software to do load-balancing. I can only advise you to do your own if you have spare f5 for stagging.

As a general rule, I'd make f5 do as much as they can : Load balancing, ssl certificates, url rewriting, asm, ... Not because f5 are better but because it is convenient to have everything in the same place. Unfortunately, when http traffic start growing over a few hundreds MB, you have to start making some job done by apache instead of f5.