We are facing some unexpected behavior with round-robin load balancing on Apache when one of the Tomcat servers goes down.
Our Setup: we have 2 Apache web servers on the front end using mod_jk module for load balancing with round-robin load distribution. We have enabled session stickyness. The load is balance amongst 4 Tomcat servers on which the applications are running.
Sometimes under heavy load, if there is a slowness in our database tier eventually one of the Tomcat servers goes into a hung state and would need a restart. The moment we bounce the Tomcat server we see a spike in requests in one of the other Tomcat servers which then would also go into hung state and need a restart.
Eventually all the Tomcat servers hang in a similar fashion.
Why does the Apache transfer the whole load to one server instead of distributing the load?
We are now trying the worker.balancer.method=B
to see if this helps to resolve our issue.
In the Images below we see that service threads shoot up,
- in Server 1 when Server 4 goes down at about 11.50
- in Server 2 when Server 1 goes down at about 11.55
Best Answer
(Posting an answer instead of a comment as it might be too long) :
I'm not saying "F5 can handle the issue in a better way" but :
I don't think it's easy to find "neutral" public benchmarking/testing about the better equipement/software to do load-balancing. I can only advise you to do your own if you have spare f5 for stagging.
As a general rule, I'd make f5 do as much as they can : Load balancing, ssl certificates, url rewriting, asm, ... Not because f5 are better but because it is convenient to have everything in the same place. Unfortunately, when http traffic start growing over a few hundreds MB, you have to start making some job done by apache instead of f5.