Nginx configuration for long-running tasks

djangogunicornnginxpython

I have a web-based application that conducts some linguistic analysis of user-submitted texts. This is a rather memory-intensive task and typically takes an extended period of time (e.g., up to 3 minutes for processing 30 files). I'm using Django's StreamingHttpResponse function to do the job, but noticed that nginx is dropping user's request after processing about 7 files (less than 50 seconds). I tried to adjust the both nginx and Gunicorn keep_alive settings, but it seems not working. I wonder if anyone here could give me some pointers on this?

I'm also wondering what is the best approach to tackle a task that takes a long time to compute? Asynchronously?

Best Answer

I'm also wondering what is the best approach to tackle a task that takes a long time to compute? Asynchronously?

This is what worker queues are for. You should consider separating the submission of the files from processing. Let the user submit the files, save them off, add a message to a worker queue to process them, indeed asynchronously. The user gets on with their business, may see a loading screen, but it's no longer related to that web session.

In the meanwhile, a separate process picks new tasks from the worker queue, processing each independently of whatever the user is doing. There are many such queueing systems, like Amazon AWS SQS:

https://aws.amazon.com/sqs/

Correct way in new versions of nginx

Turn out my first answer to this question was correct at certain time, but it turned into another pitfall - to stay up to date please check Taxing rewrite pitfalls

I have been corrected by many SE users, so the credit goes to them, but more importantly, here is the correct code:

server {
       listen         80;
       server_name    my.domain.com;
       return         301 https://$server_name$request_uri;
}

server {
       listen         443 ssl;
       server_name    my.domain.com;
       # add Strict-Transport-Security to prevent man in the middle attacks
       add_header Strict-Transport-Security "max-age=31536000" always; 

       [....]
}

Nginx – need Nginx and something like Gunicorn

Overly simplified: You need something that executes Python but Python isn't the best at handling all types of requests.

[disclaimer: I'm a Gunicorn developer]

Less simplified: Regardless of what app server you use (Gunicorn, mod_wsgi, mod_uwsgi, cherrypy) any sort of non-trivial deployment will have something upstream that will handle the requests that your Django app should not be handling. Trivial examples of such requests are serving static assets (images/css/js).

This results in two first tiers of the classic "three tier architecture". Ie, the webserver (Nginx in your case) will handle many requests for images and static resources. Requests that need to be dynamically generated will then be passed on to the application server (Gunicorn in your example). (As an aside, the third of the three tiers is the database)

Historically speaking, each of these tiers would be hosted on separate machines (and there would most likely be multiple machines in the first two tiers, ie: 5 web servers dispatch requests to two app servers which in turn query a single database).

In the modern era we now have applications of all shapes and sizes. Not every weekend project or small business site actually needs the horsepower of multiple machines and will run quite happily on a single box. This has spawned new entries into the array of hosting solutions. Some solutions will marry the app server to the web server (Apache httpd + mod_wsgi, Nginx + mod_uwsgi, etc). And its not at all uncommon to host the database on the same machine as one of these web/app server combinations.

Now in the case of Gunicorn, we made a specific decision (copying from Ruby's Unicorn) to keep things separate from Nginx while relying on Nginx's proxying behavior. Specifically, if we can assume that Gunicorn will never read connections directly from the internet, then we don't have to worry about clients that are slow. This means that the processing model for Gunicorn is embarrassingly simple.

The separation also allows Gunicorn to be written in pure Python which minimizes the cost of development while not significantly impacting performance. It also allows users the ability to use other proxies (assuming they buffer correctly).

As to your second question about what actually handles the HTTP request, the simple answer is Gunicorn. The complete answer is both Nginx and Gunicorn handle the request. Basically, Nginx will receive the request and if it's a dynamic request (generally based on URL patterns) then it will give that request to Gunicorn, which will process it, and then return a response to Nginx which then forwards the response back to the original client.

So in closing, yes. You need both Nginx and Gunicorn (or something similar) for a proper Django deployment. If you're specifically looking to host Django with Nginx, then I would investigate Gunicorn, mod_uwsgi, and maybe CherryPy as candidates for the Django side of things.

Best Answer

Related Solutions

NGINX Redirect – How to Rewrite All HTTP Requests to HTTPS While Maintaining Sub-Domain

Correct way in new versions of nginx

Nginx – need Nginx and something like Gunicorn

Related Topic