Haproxy throughput performance in a VM

haproxyload balancingnetworkingvirtual-machines

I'm new to HAProxy, but I was able to get things functional, although I am not getting the throughput that I expect.

My setup includes a 5 node storage cluster (running Riak CS, an S3 style storage) and another VM running HAProxy with roundrobin. All VMs are using VirtualBox, each on a dedicated machine. The HAProxy VM is on Windows 7, while the others are Windows 10. This is all on the same gigabit network and I tested both 1.4 and 1.5 for HAProxy.

I have a script (python + boto) that downloads without saving to disk, 100 MB files over and over, which I believe should each boil down to a simple get request, and I run that script locally 3 times for added load.

If I point them all to the ip of HAProxy, I get approximately 4-500 Mbps download while I am able to get up to 900+ Mbps if I do NOT use HAProxy and point each to a separate ip.

When doing this test, it looks like the HAProxy is maxing out a single core and becoming bottle necked. For a single script running, which I believe is a single connection, it is eating 20-40% of a cores cpu on HAProxy, which seems suspicious?

It sounds like HAProxy can handle high throughput so I am trying to debug where I might have set things up incorrectly, either in Ubuntu or in the HAProxy config file. I see minimal improvement using nbproc 3 in the config, the load is definitely not balanced across the 3 processes, as one is still maxed out.

Does anything about this setup, such as the VMs, jump out as potential red flag? Does my haproxy config sound culprit? Or my general Ubuntu settings? Also worth asking, is this a good or bad use case for HAProxy?

EDIT

I have some further digging to do, but my current feeling is that this is VM specific, potentially in the ethernet driver (e1000)? I was able to move the HAProxy setup to a physical machine (not a VM) and on a single core, it barely registered any cpu usage with my previous test case…

full config

global
    #log 127.0.0.1   local0
    #log 127.0.0.1   local1 notice
    maxconn 256000
    spread-checks 5
    daemon 
    nbproc 4 
    cpu-map 1 2
    cpu-map 2 3
    cpu-map 3 4
    cpu-map 4 5

defaults
    option dontlog-normal
    option  redispatch
    option  allbackups
    no option   httpclose
    retries 3
    maxconn 256000
    contimeout  5000
    clitimeout  5000
    srvtimeout  5000

    option forwardfor except 127.0.0.1

frontend riak_cs
    bind          *:8098
    bind          *:8080
    mode          http
    capture       request header Host len 64

    acl d1 dst_port 8098
    acl d2 dst_port 8080

    use_backend   riak_cs_backend_stats if d1
    use_backend   riak_cs_backend if d2



backend riak_cs_backend
    mode http 
    balance roundrobin
    option httpchk GET /riak-cs/ping
    timeout connect 60s
    timeout http-request 60s

    stats enable
    stats uri /haproxy?stats

    server riak1 192.168.80.105:8080 weight 1 maxconn 1024 check inter 5s 
    server riak2 192.168.80.106:8080 weight 1 maxconn 1024 check inter 5s
    server riak3 192.168.80.107:8080 weight 1 maxconn 1024 check inter 5s
    server riak4 192.168.80.108:8080 weight 1 maxconn 1024 check inter 5s
    server riak5 192.168.80.109:8080 weight 1 maxconn 1024 check inter 5s

backend riak_cs_backend_stats 
    mode http
    balance roundrobin
    timeout connect 60s
    timeout http-request 60s

    stats enable 
    stats uri /haproxy?stats

    server riak1 192.168.80.105:8098 weight 1 maxconn 1024 
    server riak2 192.168.80.106:8098 weight 1 maxconn 1024
    server riak3 192.168.80.107:8098 weight 1 maxconn 1024
    server riak4 192.168.80.108:8098 weight 1 maxconn 1024 
    server riak5 192.168.80.109:8098 weight 1 maxconn 1024 

Best Answer

I hate to answer my own question, but I think my conclusion is that my test is VM limited. I can't quite say exactly how so, but the cpu usage of HAProxy through my vm was much, much higher, and as I noted above, testing on physical hardware with the same config, even removing the nbproc part, I see a a barely noticable CPU load in HAProxy.

It wasn't my goal to run anything production through VMs, but they are much more convenient for testing (while waiting on actual hardware) and to learn how this stuff works.