Why don’t load balancer algorithm’s allow selection of worker machines based on current CPU or memory usage

apache-2.2load balancing

I'm currently investigating load balancing with Apache mod_load_balancer and mod_proxy. I will also be looking at other load balancers later but one thing has become clear. Why do hardly any of the load balancers (if any at all) make distribution decisions based on the actual load of the worker machines.

Apache, for example, distributes requests based on number of requests, amount of data throughput and length of request queue. Why don't they have some mechanism to distribute requests to the machine with the lowest CPU or memory usage.

I'm building a system where each request requires a lot of CPU to the point that 2 or 3 worker machines can only service 10 or 20 concurrent clients before I've max'ed out all their CPUs. Some requests for xml are really lightweight while others for 'stuff' are really heavy.

Does it really make any difference in the scheme of things? Does one find that even a CPU based distribution algorithm settles down into a round robin style eventually. Does it add extra overhead that makes it not worth it.

Are there any other load balancers that offer this facility, do they offer it and no one uses it for what ever reason.

It seems like something that would be really good but no one seems to implement it. I'm confused and need a little bit of advice on the subject.

Best Answer

One of the main issues with resource based load balancing is that the load information becomes stale by the time you would make the routing decision. There is an academic paper on the topic of staleness that you might want to read called Interpreting State Load Information. You can get nasty side effects like sending too much load to a box that seems under-utilized and then overwhelm it. In short, load based balancing seems like the best way to do it at first to everyone but it turns out simple methods tend to work better in practice.

In most load balancing simple algorithms are usually fine because either the transactions are short lived or they cause such low load that a round-robin or random distribution will be close enough to a good balance. There generally needs to be overhead to absorb the load from failed servers anyways (if you are close the max utilization on all 3, as soon as one dies the load will cascade and you lose the whole cluster).

One solution might be to create two queues, one for the "heavy stuff" and one for the "light stuff". I would call the "light stuff" load balancing and the "heavy stuff" job scheduling - in other words they seem like different problems. Then just have limit to max number of sessions per each client and a universal queue for them for the job scheduling. I don't know of an ideal tool for that off the top of my head though.