HAProxy URI balancing isn’t very balanced

haproxyvarnish

I'm attempting to use HAProxy 1.4.22 with URI balancing and hash-type consistent to load balance between 3 varnish cache backends. My understanding is that this will never accomplish a perfect balance between servers but it should be better than the results I'm seeing.

The relevant part of my HAproxy config looks like:

backend varnish
    # hash balancing
    balance uri
    hash-type consistent

    server varnish1 10.0.0.1:80 check observe layer7 maxconn 5000 id 1 weight 75 
    server varnish2 10.0.0.2:80 check observe layer7 maxconn 5000 id 2 weight 50
    server varnish3 10.0.0.3:80 check observe layer7 maxconn 5000 id 3 weight 50

I've been self-testing by pointing my own hosts file at the new proxy server, and I even tried re-routing the popular homepage to a separate backend that's balanced round-robin to get that outlier off the hash balanced backend, that seems to work fine. I boosted varnish1 to a weight of 75 as a test, but it didn't seem to help. My load is being very disproportionately balanced and I don't understand why this is.

Close Up stats

Full Stats

One interesting tidbit is that if I reverse the IDs, the higher ID will ALWAYS get the lion's share of the traffic. Why would the ID affect balancing?

Tweaking weights is well and good, but as my site's traffic patterns change (we are a news site and the most popular post can change rapidly) I don't want to have to constantly tweak weights. I understand it'll never be in perfect balance, but I was expecting better results than having one server with a lower weight getting 25 times more connections than another server with a higher weight.

My goal has been to reduce DB and app server load by reducing duplication at the cache level which HAproxy URI balancing is recommended for but if it's going to be this out of balance it won't work for me at all.

Any advice?

Best Answer

I'm not sure if this is very helpful, but I've struggled a bit with the same problem - and this is what I've concluded;

Hash-based load balancing will, as you've already established, never give you perfect load balancing. The behavior you see can simply be explained by having a few of the most visited / largest pages on the same server - by having few pages that gets a lot of traffic, and a lot of pages that get little traffic, this will be enough to skew the statistics.

Your configuration is to use consistent hashing. The ID's and server weight determine the final server the hashed entry will be directed to - that is why your balancing is affected by this. The documentation is pretty clear that even though this is a good algorithm for balancing caches - it may require you to change around the IDs and increase the total weight of the servers to get a more even distribution.

If you take a large sample of unique addresses (more than 1000), and you visit each of these one time - you should see that the session counter is a lot more equal across the three backends than if you allow 'ordinary' traffic against the balancer as this is affected by the traffic pattern of the site as well.

My advice would be to make sure that you hash the entire URL, not just what's to the left of "?". This is controlled by using balance uri whole in the configuration. Ref. the haproxy documentation. If you have a lot of URL's which have the same base, but with varying GET-parameters - this will definitely give you improved results.

I would also take into consideration how the load balancing affects the capacity of your cache servers. If it doesn't effectively affect redundancy in any way - I wouldn't worry too much about it, as getting perfect load balancing isn't something you are likely to achieve with URI-hashing.

I hope this helps.