HAProxy stats says all Thrift servers are down

haproxysupervisord

I am trying to setup HAProxy to load balance a group of Thrift servers. For some reason the HAProxy stats page says the servers are all down.

Here is the current HAProxy config I am trying.

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     60000
    user        haproxy
    group       haproxy

defaults
    log     global
    option  dontlognull
    option redispatch
    retries 3
    maxconn 2000
    contimeout      5000
    clitimeout      50000
    srvtimeout      50000

listen stats :5000
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /
    stats auth user:pass

listen metrix :5002
    mode tcp
    option tcplog
    balance leastconn
    server m1 127.0.0.1:9000 check
    server m2 127.0.0.1:9001 check
    server m3 127.0.0.1:9002 check
    server m4 127.0.0.1:9003 check
    server m5 127.0.0.1:9004 check
    server m6 127.0.0.1:9005 check
    server m7 127.0.0.1:9006 check
    server m8 127.0.0.1:9007 check

One other things. The Thrift servers are running in Supervisor which I have noticed some weirdness with. But I have tried just running the Thrift server not in supervisor and it still doesn't work.

I have tried all of the Thrift servers (TSimpleServer, TNonBlockingServer, TThreadedServer) all of them have the same problem.

Update

Running tcpdump port 9000

15:12:31.878502 IP ip-00.00.00.00.ec2.internal.cslistener > ip-11.11.11.11.ec2.internal.36206: Flags [R.], seq 0, ack 3433673377, win 0, length 0
15:12:33.878425 IP ip-11.11.11.11.ec2.internal.36207 > ip-00.00.00.00.ec2.internal.cslistener: Flags [S], seq 3459211721, win 5840, options [mss 1460,sackOK,TS val 440815982 ecr 0,nop,wscale 10], length 0

Where 00.00.00.00 is the server running the Thrift server and 11.11.11.11 is the server running HAProxy.
These 2 lines repeat continuously.

Running netstat -tlnp contains the following:

tcp        0      0 127.0.0.1:9000              0.0.0.0:*                   LISTEN      19472/python

So the Thrift server is listening on the right port.

Best Answer

There's not enough in that tcpdump to be sure but it looks like your health check is sending a SYN packet and is getting a RST packet in return. (Feel free to post more of it in your actual question where you can format it properly.)

I suspect there's nothing listening on 127.0.0.1:9000 (or any of the other ports).

You can check this with sudo netstat -tlnp.

Either the Thrift servers are listening on different IP addresses and/or ports or they aren't listening at all.