Haproxy does not load balance when one server in farm is down

I have three back end servers load balanced using HAProxy.

When all 3 of those are up i see each servers Id (1,2,3) consecutively in the HTTPS response. IE 1,2,3,1,2,3. As expected.

However, if i take one back end server down i then only see one server id repeatedly.

I.e if I take server 1 down I then only see server id 2 (2,2,2,2…) when I should see server id's 2,3 (2,3,2,3,2,3…..)

I have tried both leastconn and round robin balancing and both exhibit the same behaviour

Why does HAProxy behave this way? How do I alter my HAProxy config to load balance across the remaining alive back end servers?

HA proxy version is 1.6.9. Server for HA proxy is ubuntu 14.04.

The HTTP being served is an API that returns some JSON over HTTPS. Its a restful API so there is no session persistence required or wanted.

The configuration for haproxy is below.

 global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    maxconn 3072
    # Default SSL material locations
    ca-base /etc/ssl/certs
    crt-base /etc/ssl/private
    # Default ciphers to use on SSL-enabled listening sockets.
    # For more information, see ciphers(1SSL). This list is from:
    #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
    ssl-default-bind-ciphers kEECDH+aRSA+AES:kRSA+AES:+AES256:RC4-SHA:!kEDH$
    ssl-default-bind-options no-sslv3
defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    option dontlog-normal
    option redispatch
    retries 3
    maxconn 3072
    timeout connect 5000
    timeout client  10000
    timeout server  10000

frontend apis-frontend
    bind x.x.x.x:443 ssl verify none crt /etc/haproxy/xxx
    mode http
    option forwardfor
    default_backend apis

backend apis
    balance leastconn
    mode http
    option httpchk GET /
    server 1 x.x.x.x:443 ssl verify none check
    server 2 x.x.x.x:443 ssl verify none check
    server 3 x.x.x.x:443 ssl verify none check

listen stats
    bind *:8181
    mode http
    stats enable
    stats uri /
    stats realm Haproxy\ Statistics
    stats auth xx

Edit: added stats.

pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
apis-frontend,FRONTEND,,,0,2,3072,24,9118,10021,0,0,19,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,4,,,,0,24,0,19,0,0,,0,6,43,,,0,0,0,0,,,,,,,,
apis,mt-wol-vlx-vps-01,0,0,0,1,,12,4863,3240,,0,,0,0,0,0,UP,1,1,0,1,0,12097,0,,1,3,1,,12,,2,0,,2,L7OK,200,36,0,12,0,0,0,0,0,,,,0,0,,,,,39,OK,,0,1,1,29,
apis,mt-lon-vlx-vps-01,0,0,0,1,,12,4255,3228,,0,,0,0,0,0,UP,1,1,0,0,0,12097,0,,1,3,2,,12,,2,0,,2,L7OK,200,23,0,12,0,0,0,0,0,,,,0,0,,,,,39,OK,,0,1,1,5,
apis,mt-cov-uks-vps-01,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,12094,12094,,1,3,3,,0,,2,0,,0,L4TOUT,,2001,0,0,0,0,0,0,0,,,,0,0,,,,,-1,,,0,0,0,0,
apis,BACKEND,0,0,0,1,308,24,9118,6468,0,0,,0,0,0,0,UP,2,2,0,,0,12097,0,,1,3,0,,24,,1,0,,3,,,,0,24,0,0,0,0,,,,,0,0,0,0,0,0,39,,,0,1,1,33,
stats,FRONTEND,,,1,4,3072,10,3811,169181,0,0,5,,,,,OPEN,,,,,,,,,1,4,0,,,,0,1,0,4,,,,0,9,0,5,0,0,,1,3,15,,,0,0,0,0,,,,,,,,
stats,BACKEND,0,0,0,0,308,0,3811,169181,0,0,,0,0,0,0,UP,0,0,0,,0,12097,0,,1,4,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,0,,,0,0,0,18,

edit: load balancing behaviour in different browsers

After a request to test from a different ip. I noticed the second ip load balanced as expected. I then tried from the first IP but using different browsers edge and firefox. Both edge and firefox load balance as expected.

However the original browser chrome still stick to one back end server. Even after closing all chrome windows and restarting chrome

Best Answer

After much further testing the behaviour indicated only appears when using Google's Chrome browser. Why chrome shows this behaviour is undetermined.

The observed behaviour was not a HAProxy issue.

This tested multiple times with other browsers and other clients. Final proof being live service for several hours.

The HAProxy config above has now been running for several hours and is fairly distributing traffic amongst the three servers. This is evidenced from our DB records which records server id. Over a period of several hours each server was assigned the same number of requests within +- 50 from circa 4,500 requests.

More importantly, given the original question, it also continues to distribute fairly in the event of failure of one of the servers, amongst the other two. Again evidenced by our DB records which record server id. When measured over 15 minutes a clear pattern of equally balanced server ids can be seen in our DB records.

A poignant reminder to design testing to pin point where an issue lies, before assuming the issue lies with the part(s) of the solution you don't know very well, yet.

Thank you for the many replies it took for that realisation to dawn.

Best Answer

Related Solutions

Ssl – HaProxy – Http and SSL pass through config

Haproxy logging not work

Related Topic