Nginx – Haproxy: Slow responses for static content

haproxynginx

We're using HAProxy for load balancing. Our application is running on 10 Caucho Resin servers and 7 dedicated Nginx servers (this for static content).

Versions:

HA-Proxy version 1.5-dev19 2013/06/17
nginx version: nginx/1.4.6 (Ubuntu)

My problem at this moment is a super slow response of the static content such as JS, CSS and image files. When I curl or wget some files through HAproxy, response times are too high like ~3 seconds or more, but if I do the same but, getting them directly from Nginx servers responses are near ~300ms to ~600ms which is way better instead.

I've done a small test with ab, using 10 connections with concurrency of 10.

Through HAProxy:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      336  340   2.7    341     343
Processing:   425  779 738.3    471    2687
Waiting:      209  459 710.0    241    2479
Total:        760 1119 739.7    812    3030

Percentage of the requests served within a certain time (ms)
  50%    812
  66%    824
  75%    826
  80%   1782
  90%   3030
  95%   3030
  98%   3030
  99%   3030
 100%   3030 (longest request)

Direct Nginx connection:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      193  202   3.9    203     205
Processing:   424  437  10.8    438     455
Waiting:      220  239  12.1    241     257
Total:        620  639  13.4    641     659

Percentage of the requests served within a certain time (ms)
  50%    641
  66%    646
  75%    647
  80%    654
  90%    659
  95%    659
  98%    659
  99%    659
 100%    659 (longest request)

As you can see, we have some kind of problem there.

Both Nginx's and HAproxy has a tweaked sysctl and improved ulimits, running on Intel Gigabit's cards and Cisco Catalyst Switches.

Haproxy's sysctl:

#-------------------------------------------
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1

net.ipv4.neigh.default.gc_interval = 3600
net.ipv4.neigh.default.gc_stale_time = 3600
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh1 = 1024

net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65023
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 3
net.core.somaxconn = 16384
net.ipv4.tcp_fin_timeout = 12

net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_max_syn_backlog = 2048

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mem = 50576   64768   98152
net.core.netdev_max_backlog = 2048

#----------------------------------------

Nginx's sysctl:

net.ipv4.tcp_fin_timeout = 15
kernel.sysrq = 1
kernel.panic = 20
kernel.panic_on_oops = 5
fs.file-max = 200000
net.ipv4.ip_local_port_range = 2000 65000
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_max_syn_backlog = 3240000
net.core.somaxconn = 3240000
net.ipv4.tcp_max_tw_buckets = 1440000
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_congestion_control = cubic

And HAproxy's configuration:

global
    maxconn 160000
    spread-checks   5
    user haproxy
    group haproxy
    daemon
    log 127.0.0.1 local0 warning
    stats socket /etc/haproxy/haproxysock level admin
defaults
    log global
    mode http
    balance leastconn
    option redispatch # any server can handle any session
    option http-server-close
    timeout client 20s
    timeout connect 5s
    timeout server 30s
    timeout queue 25s
    timeout check 2s
    timeout http-request 15s
    timeout http-keep-alive 5s
    maxconn 160000
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 503 /etc/haproxy/errors/504.http
    errorfile 408 /dev/null

frontend incoming my-public-ip:80
    reqadd X-Forwarded-Proto:\ http

    stick-table type ip size 1m expire 1m store gpc0,http_req_rate(10s),http_err_rate(10s)
    tcp-request connection track-sc1 src
    tcp-request connection reject if { src_get_gpc0 gt 0 }
    http-request deny if { src_get_gpc0 gt 0 }


    #Different ACL for JS/CSS/Fonts
    acl staticjs    path_end .js    
    acl staticjs    path_end .css
    acl staticjs    path_end .woff


    #Static
    acl staticacl hdr_dom(host) -i static.mydomain.com
    acl staticacl hdr_dom(host) -i static.myotherdomain.com

    #Dynamic
    acl dynacl hdr_dom(host) -i mydomain.com
    acl dynacl hdr_dom(host) -i www.mydomain.com
    acl dynacl hdr_dom(host) -i myanotherdomain.com
    acl dynacl hdr_dom(host) -i www.myanotherdomain.com



    use_backend static if staticacl
    use_backend dynamic if dynacl
    use_backend staticcssjs if staticjs
    default_backend dynamic

backend dynamic :80
    acl abuse src_http_req_rate(incoming) ge 700
    acl flag_abuser src_inc_gpc0(incoming)
    tcp-request content reject if abuse flag_abuser
    http-request deny if abuse flag_abuser
    option  http-server-close
    server resin6 192.168.1.75:8080 check
        server resin6-2 192.168.1.75:8081 check
        server resin5 192.168.1.73:8080 check
        server resin5-2 192.168.1.73:8081 check
        server resin4 192.168.1.59:8080 check
        server resin4-2 192.168.1.59:8081 check
        server resin3 192.168.1.53:8080 check
        server resin3-2 192.168.1.53:8081 check
        server resin2 192.168.1.52:8080 check
        server resin2-2 192.168.1.52:8081 check


backend static :80
    option abortonclose 
    acl abuse src_http_req_rate(incoming) ge 2300
    acl flag_abuser src_inc_gpc0(incoming)
    tcp-request content reject if abuse flag_abuser
    http-request deny if abuse flag_abuser

        server cache1 192.168.1.54:81 check weight 100
        server cache2 192.168.1.55:81 check weight 100
        server cache3 192.168.1.68:81 check weight 100
        server cache4 192.168.1.69:81 check weight 100
        server static1 192.168.1.54:67 check weight 80 
        server static2 192.168.1.55:67 check weight 80
        server static3 192.168.1.68:67 check weight 80
        server static4 192.168.1.69:67 check weight 80

backend staticcssjs :80
        option abortonclose
        acl abuse src_http_req_rate(incoming) ge 2300
        acl flag_abuser src_inc_gpc0(incoming)
        tcp-request content reject if abuse flag_abuser
        http-request deny if abuse flag_abuser
    server static5  192.168.1.74:67 check weight 50
    server static6  192.168.1.82:67 check weight 50

Do you guys experimented something similar? This is driving me crazy. Actually we have ~15k connections to our balancer:

root@haproxy:~# netstat -talpn | grep mypublicip | wc -l
15656

Which more of them are TIME_WAIT connections:

root@haproxy:~# netstat -talpn | grep mypublicip | grep WAIT | wc -l
14472
root@haproxy:~# netstat -talpn | grep mypublicip | grep EST | wc -l
1172
root@haproxy:~# netstat -talpn | grep mypublicip | grep LISTEN | wc -l
2

An output of vmstat 1 does not show me any problem

root@haproxy:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 6886204 395364 527264    0    0     0     1    0    0  1  3 96  0
 0  0      0 6883684 395364 527288    0    0     0     0 17458 10061  2  4 94  0
 0  0      0 6885056 395364 527288    0    0     0     0 18165 17773  2  4 94  0
 1  0      0 6883428 395364 527288    0    0     0     0 16436 14367  2  4 93  0
 1  0      0 6882928 395364 527288    0    0     0     0 16330 10098  2  3 95  0
 0  0      0 6884584 395364 527288    0    0     0    16 16579 9063  3  4 92  0
 1  0      0 6885632 395364 527292    0    0     0    12 14936 11526  2  3 95  0
 1  0      0 6884028 395364 527292    0    0     0     0 16808 13303  2  4 93  0
 0  0      0 6884408 395364 527292    0    0     0     0 16623 8892  2  4 94  0
 1  0      0 6884896 395364 527292    0    0     0     0 14480 8565  2  3 95  0
 1  0      0 6884532 395364 527292    0    0     0     0 14760 10602  1  3 95  0

May 33% of CPU is too much? We are running on an AMD Opteron 4133 (8 Cores/processors)

root@NaventLB1:~# top

top - 08:28:25 up 230 days, 15:08,  5 users,  load average: 0.51, 0.36, 0.34
Tasks: 145 total,   3 running, 142 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.4%us,  1.4%sy,  0.0%ni, 95.9%id,  0.0%wa,  0.0%hi,  1.2%si,  0.0%st
Mem:   8162012k total,  1276420k used,  6885592k free,   395364k buffers
Swap:  9764860k total,        0k used,  9764860k free,   527432k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                       
10199 haproxy   20   0  103m  56m  732 R   33  0.7  17:57.96 haproxy                                                                                                                                        
64385 www-data  20   0 4098m  26m 1860 S    4  0.3   1387:14 pound                                                                                                                                          
    1 root      20   0 24320 2196 1272 S    0  0.0   5:46.10 init   

Cheers!

Best Answer

I've resolved (or sort of) my problem using another HAProxy to serve static content, by this way performance now is very high and response times are way better than before.

Anyway, I will be studying what happend with this.

Related Topic