Nginx with Passenger behind HAProxy causes 503 errors

Over the last month I was forced to learn a lot of things about server configuration, integration, AWS, etc. I have never done these to this extent.

I got everything up and running well for my app (thanks mostly to the http://github.com/wr0ngway/rubber gem and help from #rubberec2 IRC channel). However, I'm encountering a mysterious (to me) problem.

Stack

I am running Nginx + Passenger behind HAProxy. So far only one Nginx + Passenger host is being used, so HAProxy doesn't really do much yet, but we will add more app servers in the future.

Problem

I am stuck with occasional 503 errors that become annoying at certain times of day (during a higher load?). These errors are happening on both static assets, and routed urls. I have determined that it's HAProxy that throws them, because the page and its headers are identical to what's in /etc/haproxy/errors/503.http.

I thought that nginx doesn't care how many requests it receives, it can handle all of them, since it has its own queueing, and passenger distributes things correctly. So why then HAProxy claims there was no server available to handle some requests?

My HAProxy config

global
  log 127.0.0.1 local0 warning
  maxconn 1024

defaults
  log global
  mode http
  retries 3
  balance roundrobin
  option abortonclose
  option redispatch
  option httplog
  contimeout 4000
  clitimeout 150000
  srvtimeout 30000

listen passenger_proxy x.x.x.x:x
  option forwardfor
  server web01 web01:xxxx maxconn 20 check

Note: IPs and ports are replaced with xes.

P.S. I'm not good at this stuff, learning as I go.

Update

I used siege to benchmark the server and found that I can reproduce the 503s when running about 58 concurrent sessions. The success rate is only 54% in such case.

Update 2

I found out that nginx access log outputs "-" 400 0 "-" "-" "-" every time I get 503.

Update 3

Everyone says that nginx gives "400 Bad Request" errors when the cookies are too big. However setting large_client_header_buffers directive didn't fix it for me.

Update 4

I ran siege on the server, targeting nginx directly on its listen port, and now nginx started returning 499 errors with the same pattern as it used to return 503s before. Siege keeps telling me that connection timed out when that happens. Looks like I'm getting closer.

Update 5

I noticed that nginx was logging in two places on my system, and there was an error log returning this message every time siege showed "Connection timed out":

file=ext/nginx/HelperAgent.cpp:574 time=2011-09-15 07:43:22.196 ]: Couldn't forward the HTTP response back to the HTTP client: It seems the user clicked on the 'Stop' button in his browser.

Best Answer

From the HAProxy configuration guide you need to increase the maxconn parameter on your server declaration.

When a server has a "maxconn" parameter specified, it means that its number of concurrent connections will never go higher. Additionally, if it has a "minconn" parameter, it indicates a dynamic limit following the backend's load. The server will then always accept at least connections, never more than , and the limit will be on the ramp between both values when the backend has less than concurrent connections. This makes it possible to limit the load on the servers during normal loads, but push it further for important loads without overloading the servers during exceptional loads.

I highly suggest reading through the whole document as there is alot of good info in there.

Stack

Problem

My HAProxy config

Update

Update 2

Update 3

Update 4

Update 5

Best Answer

Related Solutions

Ssl – HaProxy – Http and SSL pass through config

Nginx – Timeouts somewhere on our stack (haproxy, nginx, rails, memcached)

Related Topic