I'm new to HAProxy, but I was able to get things functional, although I am not getting the throughput that I expect.
My setup includes a 5 node storage cluster (running Riak CS, an S3 style storage) and another VM running HAProxy with roundrobin
. All VMs are using VirtualBox, each on a dedicated machine. The HAProxy VM is on Windows 7, while the others are Windows 10. This is all on the same gigabit network and I tested both 1.4 and 1.5 for HAProxy.
I have a script (python + boto) that downloads without saving to disk, 100 MB files over and over, which I believe should each boil down to a simple get request, and I run that script locally 3 times for added load.
If I point them all to the ip of HAProxy, I get approximately 4-500 Mbps download while I am able to get up to 900+ Mbps if I do NOT use HAProxy and point each to a separate ip.
When doing this test, it looks like the HAProxy is maxing out a single core and becoming bottle necked. For a single script running, which I believe is a single connection, it is eating 20-40% of a cores cpu on HAProxy, which seems suspicious?
It sounds like HAProxy can handle high throughput so I am trying to debug where I might have set things up incorrectly, either in Ubuntu or in the HAProxy config file. I see minimal improvement using nbproc 3
in the config, the load is definitely not balanced across the 3 processes, as one is still maxed out.
Does anything about this setup, such as the VMs, jump out as potential red flag? Does my haproxy config sound culprit? Or my general Ubuntu settings? Also worth asking, is this a good or bad use case for HAProxy?
EDIT
I have some further digging to do, but my current feeling is that this is VM specific, potentially in the ethernet driver (e1000)? I was able to move the HAProxy setup to a physical machine (not a VM) and on a single core, it barely registered any cpu usage with my previous test case…
full config
global
#log 127.0.0.1 local0
#log 127.0.0.1 local1 notice
maxconn 256000
spread-checks 5
daemon
nbproc 4
cpu-map 1 2
cpu-map 2 3
cpu-map 3 4
cpu-map 4 5
defaults
option dontlog-normal
option redispatch
option allbackups
no option httpclose
retries 3
maxconn 256000
contimeout 5000
clitimeout 5000
srvtimeout 5000
option forwardfor except 127.0.0.1
frontend riak_cs
bind *:8098
bind *:8080
mode http
capture request header Host len 64
acl d1 dst_port 8098
acl d2 dst_port 8080
use_backend riak_cs_backend_stats if d1
use_backend riak_cs_backend if d2
backend riak_cs_backend
mode http
balance roundrobin
option httpchk GET /riak-cs/ping
timeout connect 60s
timeout http-request 60s
stats enable
stats uri /haproxy?stats
server riak1 192.168.80.105:8080 weight 1 maxconn 1024 check inter 5s
server riak2 192.168.80.106:8080 weight 1 maxconn 1024 check inter 5s
server riak3 192.168.80.107:8080 weight 1 maxconn 1024 check inter 5s
server riak4 192.168.80.108:8080 weight 1 maxconn 1024 check inter 5s
server riak5 192.168.80.109:8080 weight 1 maxconn 1024 check inter 5s
backend riak_cs_backend_stats
mode http
balance roundrobin
timeout connect 60s
timeout http-request 60s
stats enable
stats uri /haproxy?stats
server riak1 192.168.80.105:8098 weight 1 maxconn 1024
server riak2 192.168.80.106:8098 weight 1 maxconn 1024
server riak3 192.168.80.107:8098 weight 1 maxconn 1024
server riak4 192.168.80.108:8098 weight 1 maxconn 1024
server riak5 192.168.80.109:8098 weight 1 maxconn 1024
Best Answer
I hate to answer my own question, but I think my conclusion is that my test is VM limited. I can't quite say exactly how so, but the cpu usage of HAProxy through my vm was much, much higher, and as I noted above, testing on physical hardware with the same config, even removing the
nbproc
part, I see a a barely noticable CPU load in HAProxy.It wasn't my goal to run anything production through VMs, but they are much more convenient for testing (while waiting on actual hardware) and to learn how this stuff works.