Nginx Tornado Combination Causing 502 Bad Gateway Errors

nginxpythontornado

We are facing a problem with inconsistent 502 errors and tracking down the reasons has been a very frustrating exercise. We can reproduce the problem by sending several simultaneous requests quickly. The problem is that several is only in the range of 10 to 20 within a 5 seconds (not a typo). So clearly this type of load should be handled easily.

We really like the Nginx + Tornado approach but are considering going to a more traditional (e.g. threading) approach because this problem has been very difficult to solve. I was wondering if you a) know how to fix this issue and b) how we can tracked down the culprit(s).

The log files simply identify there being a connection refused. We have the same problem as this post:
https://stackoverflow.com/questions/2962439/how-do-i-debug-a-http-502-error

But there is no answer provided on how to solve the problem so I'm hoping you can help because this may be a common issue with this type of setup.

Thanks in advance,

Paul

Best Answer

By default nginx is not configured to retry connections to another upstream if one of them sends back a 502 error. You basically need to add this:

proxy_next_upstream error timeout http_502;

To your configuration. This will prevent the 502 errors from being sent directly back to the client and instead cause nginx to try and hunt for a better upstream. It will attempt all of the upstreams before failing back to the client according to this post:

http://forum.nginx.org/read.php?2,152071,152212

Here is more details on the configuration directive:

http://wiki.nginx.org/HttpProxyModule#proxy_next_upstream