Nginx + fastCGI + Django – getting data corruption in the responses sent to the client

I am running Django behind nginx using FastCGI. I have discovered that in some of the responses sent to the client, random data corruption is occurring in the middle of the responses (might be a couple hundred bytes or so in the middle).

At this point I have narrowed it down to either being a bug in either nginx's FastCGI handler or Django's FastCGI handler (i.e. probably a bug in flup), since this problem never occurs when I run the Django server in standalone (i.e. runserver) mode. It only happens in FastCGI mode.

Other interesting trends:

It tends to happen on larger responses. When a client logs in for the first time, they are sent a bunch of 1MB chunks to sync them up to the server DB. After that first sync, the responses are much smaller (usually a few KB at a time). The corruption always seems to happen on those 1MB chunks sent at the start.
It happens more often when the client is connected to the server via LAN (i.e. low-latency, high-bandwidth connection). This makes me think there is some kind of race condition in nginx or flup that is exacerbated by an increased data rate.

Right now, I've had to work around this by putting an extra SHA1 digest in the response header, and having the client reject responses where the header doesn't match the body checksum, but this is kind of a horrible solution.

Has anyone else experienced anything like this, or have any pointers as to how to identify whether it is flup or nginx that is at fault here so I can file a bug with the appropriate team?

Thanks in advance for any help.

Note: I also posted a similar bug in lighttpd + FastCGI + Django a while back here: https://stackoverflow.com/questions/3714489/lighttpd-fastcgi-django-truncated-response-sent-to-client-due-to-unexpected … even though this isn't the same thing (truncation vs corruption), it's starting to look like the common culprit is flup / Django rather than the web server ..

Edit: I should also note what my environment is:

OSX 10.6.6 on a Mac Mini
Python 2.6.1 (System)
Django 1.3 (from official tarball)
flup 1.0.2 (from Python egg on flup site)
nginx +ssl 1.0.0 (from Macports)

EDIT: In response to Jerzyk's comment, the code path that assembles the response looks like (edited for succinctness):

# This returns an objc NSData object, which is an array.array 
# when pushed through the PyObjC bridge
ret = handler( request ) 

response = HttpResponse( ret )
response[ "Content-Length" ] = len( ret )
return response

I don't think it's possible that the Content-Length is wrong based on that, and AFAIK there is no way to mark a Django HttpResponse object as explicitly binary as opposed to text. Also, since the issue happens only intermittently, I don't think that explains it otherwise presumably you would see it on every request.

EDIT @ionelmc: You have to set the Content-Length in Django – nginx does not set this for you, as per the below example once I disabled setting Content-Length explicitly:

$ curl -i http://localhost/io/ping
HTTP/1.1 200 OK
Server: nginx/1.0.0
Date: Thu, 23 Jun 2011 13:37:14 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive

AKSJDHAKLSJDHKLJAHSD

Best Answer

Do you have any kind of nginx caching ( bypass / no_cache ) directive active for the fastcgi responses?

In nginx' 1.0.3 Changenotes they fixed a response corruption:

Bugfix: a cached response may be broken if "proxy/fastcgi/scgi/ uwsgi_cache_bypass" and "proxy/fastcgi/scgi/uwsgi_no_cache" directive values were different; the bug had appeared in 0.8.46.

Source: http://nginx.org/en/CHANGES ( 1.0.3. section )

Best Answer

Related Solutions

Python – Configure FastCGI for Python

Nginx 12 FastCGI sent in stderr Primary script unknown

Related Topic