Linux – HAProxy check says server is down when it is up

glassfishhaproxylinux

I am trying to setup 2 Glassfish servers in a load balanced configuration using UCARP and HAProxy

Server1 has 2 IPs x.x.x.17 and x.x.x.18

HAProxy is listening on only x.x.x.18 and Glassfish listening on only x.x.x.17 running with the following configuration…

global

maxconn 4096
debug
user haproxy
group haproxy

defaults

mode http
retries 3
option redispatch

listen wms x.x.x.18:8080
source x.x.x.18
option httpchk
balance leastconn
server Server1 x.x.x.17:8080 check inter 2000 fastinter 500 fall 2 weight 50
server Server2 x.x.x.19:8080 check inter 2000 fastinter 500 fall 2 weight 50

Server2 with 1 IP x.x.x.19 is running Glassfish

Even though I can manually wget the page from x.x.x.17:8080 and receive a 200 OK response, HAProxy says Server1 is DOWN and doesn't direct any requests to it. I can't find any reason why.

Here is an excerpt from the Server1 access log with the checks…

"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:23 +0000" "OPTIONS / HTTP/1.0" 200 0
"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:23 +0000" "OPTIONS / HTTP/1.0" 200 0
"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:23 +0000" "OPTIONS / HTTP/1.0" 200 0
"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:29 +0000" "OPTIONS / HTTP/1.0" 200 0
"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:29 +0000" "OPTIONS / HTTP/1.0" 200 0

Here is an excerpt from the Server2 access log with the checks…

"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:25 +0000" "OPTIONS / HTTP/1.0" 200 0
"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:25 +0000" "OPTIONS / HTTP/1.0" 200 0
"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:31 +0000" "OPTIONS / HTTP/1.0" 200 0
"x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:31 +0000" "OPTIONS / HTTP/1.0" 200 0

If I remove the httpchk option then Server1 checks as UP, however this is not a permanent solution because we need it to fail over properly if the response really fails.

Any ideas?

(HAProxy is v1.3.22)

Addn: I just tried adding server3 x.x.x.13 running Glassfish but on Windows and that also says down when it is up and accessible from the proxy machine.

Addn2: After installing v1.4 of haproxy to get error codes, the error is Layer7 invalid response info: "HTTP/1.1 ". When we retrieve the page manually both the UP and DOWN server return HTTP/1.1 200 OK as the first line.

So after running wireshark to see what is going on. On the glassfish server which works (and all the other webservers I've checked) the response HTTP/1.1 200 OK all comes in the first packet. On the glassfish servers that don't work the response comes in 3 packets of HTTP/1.1 then 200 then OK.

So any idea why HAProxy is not dealing with multiple packets or how to configure glassfish not to split it? (maxKeepAliveRequests=1 already)

Best Answer

The answer is that Glassfish in the latest versions splits the response into multiple packets.

I posted on the haproxy mailing list and had a remarkably quick response.

Krzysztof Oledzki confirmed that haproxy assumes that the response will all be contained within the the first packet as that is the behavior of most known web servers. He built a patch with a quick and dirty fix which is available in the mailing list archives if you search for Glassfish and can be applied to the beta or latest stable version 1.3.22

I also tried to find out why Glassfish has started to behave this way but without paid support I got nowhere. If anyone can answer that, the bounty is still open.