Nginx with node and lingering_close keeping connections open for 5 seconds

clusterconfigurationnginxnode.js

We have a collection of servers set up to use our new shiny API, which will replace our old API written in an arcane language.

The new application is a cluster of servers using node, just behind NGINX.

This is the same type clustered set up for the old API.

There is another server sitting in font of these two clusters, using NGINX to route traffic to one or the other.

Right now the new cluster is getting much less than 1% of the traffic, while the old cluster is receiving much more than 99% of the traffic.

Logs indicate that the client (the client sitting in front of the NGINX router) is always receiving responses in a timely manner (regardless of which cluster processes the request)

Logs also indicate that node is responding to it's local NGINX in a timely manner.

The old NGINX/API is working well.

However, the LOCAL NGINXs for the node cluster are logging that each request is taking the time it takes for node to respond… plus an extra 5 seconds.

A little investigation proves that this is due to a configuration setting called lingering_close… which is set to 5 seconds. According to the documentation, lingering close uses 'Heuristics' to decide when to stay open.

http://nginx.org/en/docs/http/ngx_http_core_module.html#lingering_close

That's more than a little vague.

We know that the connection only stays open for 5 seconds when responses are smaller than 1.1k. I know that's weird… but 'Heuristics'

If we turn lingering_close off… the connections close without the impact of Heuristics.

This never seems to happen on the OLD cluster.

Does anyone have any clearer information on what Heuristics might be keeping the connection open, and possibly some advice on how to proceed.

my biggest worry is that all traffic is moved to the second cluster, and all of these open connections start to cause a performance issue.

Best Answer

It's all about indication that there can be more data left in the socket. E.g. the lingering close is enabled if not complete request body has been read during processing, or if there are some more data left in the buffer, or if the socket in active state.

You may be interested in this change, which significantly reduces probability of lingering close on Linux: http://hg.nginx.org/nginx/rev/f7849bfb6d21