Haproxy tcp roundrobin loadbalancing not working as expected

haproxy

I'm new to haproxy, and using it for TCP load balancing of rsyslog logs to ArcSight connectors. For the life of me, I cannot get traffic to evenly balance across all of the nodes in the pool (which is the desired behavior). I've tried many permutations of weights and maxconn's to no avail.

It feels like this should be an easy problem, but the per pool node behaviors are very confusing. Also, since most people use haproxy for http load balancing, I'm finding scant documentation about the best way to go about what I am trying to do.

Anyone have any insights, proven configs, or troubleshooting steps to recommend?

Thanks!

Here is our current config:

global
    log 127.0.0.1       local0
    log 127.0.0.1       local1 notice
    maxconn 256000
    user haproxy
    group haproxy
    spread-checks 5
    daemon
    quiet

defaults
    log global
    option dontlognull
    option redispatch
    option allbackups
    maxconn 256000
    timeout connect 5000

listen stats :1936
    mode http
    stats enable
    stats realm Haproxy\ Statistics
    stats uri /
    stats auth admin:savetheday

frontend rsyslog_netscreen
    bind 127.0.0.1:8514
    mode tcp
    option tcplog
    option contstats
    option tcpka
    default_backend rsyslog_netscreen_backend

backend rsyslog_netscreen_backend
    balance roundrobin
    mode tcp
    option tcpka
    option srvtcpka
    server netscreen1 localhost:9515 weight 1 maxconn 1024 check
    server netscreen2 localhost:9516 weight 1 maxconn 1024 check
    server netscreen3 localhost:9517 weight 1 maxconn 1024 check
    server netscreen4 localhost:9518 weight 1 maxconn 1024 check
    server netscreen5 localhost:9519 weight 1 maxconn 1024 check
    server netscreen6 localhost:9520 weight 1 maxconn 1024 check

Best Answer

Note that roundrobin is not a good strategy to achieve even load. It will make sure that each backend receives the same number of connections over time, but does not care how long each connection last.

In your stats view, it should be apparent that the total number of sessions per backend server is almost equal (if their uptimes are equal). The number of current sessions can vary quite a bit, though.

We have found that using leastconn instead of roundrobin yields a much more even load. This makes sense, because servers that happen to be stuck with many long-lived clients that hold on to their connection need not be burdened with new incoming connections.