Ssl – HAProxy 1.6 SSL connections reseting


I have been beating my head against this HAProxy issue for entirely to long, and I am hoping someone here has seen this before.

Here is some information:

  1. Our HAP instances are in AWS across three regions
  2. The only traffic these instances are receiving are from our clients
  3. Each of our clients has an HAProxy install that forwards requests from their end users to us on 80 and 443 to 1025 and 1026 respectively.
  4. These requests are forwarded over TCP using proxy protocol to our HAP instances.
  5. Our HAP instances then SSL term the request and forward them off to our backend on port 80.
  6. Our routing is all done inside of Route53 with health checks.
  7. These health checks will mark as failed for the instance if the instance doesn't fails to reply back in time, three times in 30 seconds. The checks go off around 4 times a second.

Now for the problem: Every so often these servers are hanging on SSL handshakes, causing a the handshake to reset (found this in a tcpdump during a failure) and thus causing the health checks to hang long enough to cause a failure. This happened about 450 times over the weekend across 6 instances.

Memory and CPU aren't spikey enough to cause for any alarm, even during the hanging of the handshake. 

Here is the config for the HAP Instances:

# HAProxy Config
# Global settings
    log local2

    pidfile     /var/run/
    maxconn     30000
    user        haproxy
    group       haproxy
    ssl-default-bind-options no-sslv3 no-tls-tickets
    tune.ssl.default-dh-param 2048

# turn on stats unix socket
#    stats socket /var/lib/haproxy/stats`

# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
    mode                    http
    log                     global
    option                  httplog
    retries                 3
    timeout http-request    5s
    timeout queue           1m
    timeout connect         31s
    timeout client          31s
    timeout server          31s
    maxconn                 15000

# Stats
    stats           enable
    stats uri           /haproxy?stats
    stats realm         Strictly\ Private
    stats auth          $StatsUser:$StatsPass

# main frontend which proxys to the backends

frontend shared_incoming
    maxconn 15000
    timeout http-request 5s

#    Bind ports of incoming traffic
    bind *:1025 accept-proxy # http
    bind *:1026 accept-proxy ssl crt /path/to/default/ssl/cert.pem ssl crt /path/to/cert/folder/ # https
    bind *:1027 # Health checking port
    acl gs_texthtml url_reg \/gstext\.html    ## allow gs to do meta tag verififcation
    acl gs_user_agent hdr_sub(User-Agent) -i globalsign    ## allow gs to do meta tag verififcation

#      Add headers
    http-request set-header $Proxy-Header-Ip %[src]
    http-request set-header $Proxy-Header-Proto http if !{ ssl_fc }
    http-request set-header $Proxy-Header-Proto https if { ssl_fc }

#     Route traffic based on domain
    use_backend gs_verify if gs_texthtml or gs_user_agent    ## allow gs meta tag verification
    use_backend %[req.hdr(host),lower,map_dom(/path/to/map/,unknown_domain)]

#     Drop unrecognized traffic
    default_backend unknown_domain

# Backends

backend server0  ## added to allow gs ssl meta tag verification
    reqrep ^GET\ /.*\ (HTTP/.*)    GET\ /GlobalSignVerification\ \1
    server server0_http

backend server1
    server server1_http

backend server2
    server server2_http

backend server3
    server server3_http

backend server4
    server server4_http

backend server5
    server server5_http

backend server6
    server server6_http

backend server7
    server server7_http

backend server8
    server server8_http

backend server9
    server server9_http

backend unknown_domain
    timeout connect 4s
    timeout server 4s
    errorfile 503 /etc/haproxy-shared/errors/404.html

Best Answer

If SSL is involved I'd take a look at your entropy pool -- perhaps you've exhausted it. Keep in eye on "cat /proc/sys/kernel/random/entropy_avail" and see if it drops down to 0ish when you see the problem.


If so you might look at installing rngd to add to the pool above what the kernel already does.