Python – Should I stick with or abandon Python to deal with concurrency

concurrencypython

I have a 10K LOC project written in Django with quite a deal of Celery (RabbitMQ) for asynchronicity and background jobs where needed, and have come to the conclusion that parts of the system would benefit from being rewritten in something other than Django for better concurrency. Reasons include:

Signals handling and mutable objects. Especially when one signal triggers another, handling them in Django using the ORM can be surprising when instances change or disappear. I want to use some messaging approach where the data passed along doesn't change in a handler (Clojure's copy-on-write approach seems nice, if I got it right).
Parts of the system are not web-based, and need better support for performing tasks concurrently. For example, the system reads NFC tags, and when one is read an LED goes on for a few seconds (Celery task), a sound is played (other Celery task), and the database is queried (other task). This is implemented as a Django management command, but Django and its ORM being synchronous by nature and sharing memory is limiting (we are thinking of adding more NFC readers, and I don't think that the Django+Celery approach will work any longer, I'd like to see better message-passing capabilities).

What are the pros and cons of using something like Twisted or Tornado compared with going for a language such as Erlang or Clojure? I am interested in practical benefits and detriments.

How did you come to the conclusion that some parts of the system would fare better in another language? Are you suffering performance problems? How severe are those problems? If it can be faster, is it essential that it is faster?

Example 1: Django at work outside an HTTP request:

An NFC tag is read.
The database (and possibly LDAP) is queried, and we want to do something when data becomes available (red or green light, play a sound). This blocks using the Django ORM, but as long as there are Celery workers available it doesn't matter. May be a problem with more stations.

Example 2: “message-passing” using Django signals:

A post_delete event is handled, other objects may be altered or deleted because of this.
At the end, notifications should be sent to users. Here, it would be nice if arguments passed to the notification handler were copies of deleted or to-be-deleted objects and guaranteed not to change in the handler. (It could be done manually simply by not passing objects managed by the ORM to handlers, of course.)

Best Answer

Opening Thoughts

How did you come to the conclusion that some parts of the system would fare better in another language? Are you suffering performance problems? How severe are those problems? If it can be faster, is it essential that it is faster?

Single-Thread Asynchrony

There are several questions and other web resources that already deal with the differences, pros, and cons of single-thread asynchrony vs. multi-thread concurrency. It’s interesting to read about how Node.js's single-thread asynchronous model performs when I/O is the major bottleneck, and there are many many requests being serviced at once.

Twisted, Tornado, and other asynchronous models make excellent use of a single-thread. Since a lot of web programming has lots of I/O (network, database, etc.), the time spent waiting on remote calls adds up significantly. That is time that could be spent doing other things—like kicking off other database calls, rendering pages, and generating data. The utilisation of that single-thread is extremely high.

One of the greatest benefits of single-thread asynchrony is that it uses much less memory. In multi-thread execution, each thread requires a certain amount of reserved memory. As the number of threads increases, so does the amount of memory required just for the threads to exist. Since memory is finite, it means there are bounds on the number of threads that can be created at any one time.

Example

In the case of a web server, pretend each request is given its very own thread. Say 1MB of memory is required for each thread, and the web server has 2GB of RAM. This web server would be capable of processing (approximately) 2000 requests at any point in time before there just isn't enough memory to process any more.

If your load is significantly higher than this, requests are going to take a very long time (when waiting for older requests to complete), or you’re going to have to throw more servers into the cluster to expand the number of concurrent requests possible.

Multi-thread Concurrency

Multi-thread concurrency instead relies on executing several tasks at the same time. That means that if a thread is blocked waiting on a database call to return, other requests can be processed at the same time. Thread utilisation is lower, but the number of threads executing is much larger.

Multi-thread code is also much harder to reason about. There are issues with locking, synchronisation, and other fun concurrency problems. Single-thread asynchrony doesn't suffer from the same problems.

Multi-thread code is much more performant for CPU intensive tasks, however. If there exists no opportunities for a thread to “yield”—such as a network call that would normally block—a single-thread model just isn't going to have any concurrency whatsoever.

Both May Coexist

There's of course overlap between the two; they are not mutually exclusive. For instance, multi-thread code can be written in a non-blocking way, to better utilise each thread.

The Bottom Line

There are many other issues to consider, but I like to think about the two like this:

If your program is I/O bound, then single-thread asynchrony is probably going to work quite well.
If your program is CPU bound, then a multi-thread system will probably be best.

In your particular case, you need to determine what kind of asynchronous work is being completed, and how often those tasks arise.

Do they occur on every request? If so, memory is probably going to become an issue as the number of requests rise.
Are these tasks ordered? If so, you’re going to have to consider synchronisation if using multiple threads.
Are these tasks CPU intensive? If so, is a single-thread able to keep up with the load?

There's no simple answer. You must consider what your use cases are, and design accordingly. Sometimes an asynchronous single-thread model is better. Other times, using a number of threads to achieve massive parallel processing is required.

Other Considerations

There are other issues you need to consider also, rather than just the concurrency model you choose. Do you know Erlang or Clojure? Do you think you'd be capable of writing safe multi-thread code in one of these languages such that you improve the performance of your application? Is it going to take a long time to get up to speed in one of these languages, and will the language you learn benefit you in the future?

How about the difficulties associated with communication between these two systems? Will it be overly complex maintaining two separate systems in parallel? How will the Erlang system receive tasks from Django? How will Erlang communicate those results back to Django? Is performance significant enough a problem that the added complexity is worth it?

Final Thoughts

I've always found Django to be quick enough, and it is used by some very heavily trafficked sites. There are several performance optimisations you can make to increase the number of concurrent requests and response time. Admittedly, I haven’t done anything with Celery thus far, so the usual performance optimisations are probably not going to solve any issues you may be having with these asynchronous tasks.

Of course, there’s always the suggestion of throwing more hardware at the problem. Is the cost of provisioning a new server cheaper than the development and maintenance cost of an entirely new subsystem?

I've asked far too many questions at this point, but that was my intention. The answer is not going to be easy without analysis and further detail. Being able to analyse the problems comes down to knowing the questions to ask, though…so hopefully I've helped on that front.

My gut feeling says that a rewrite in another language is unnecessary. The complexity and cost will probably be too great.

Edit

Response to Follow-Up

Your follow-up presents some very interesting use cases.

1. Django working outside HTTP requests

Your first example involved reading NFC tags, then querying the database. I don’t think that writing this part in another language will be that useful to you, simply because querying the database or an LDAP server is going to be bound by network I/O (and potentially database performance). On the other hand, the number of concurrent requests will be bound by the server itself, since each management command will be run as its own process. There will be setup and teardown time that impacts performance, since you aren't sending messages to an already running process. You will, however, be able to send multiple requests at the same time, since each will be an isolated process.

For this case, I see two avenues you can investigate:

Ensure that your database is capable of handling multiple queries at once with connection pooling. (Oracle, for example, requires that you configure Django accordingly 'OPTIONS': {'threaded':True}.) There may be similar configuration options at the database level or Django level that you can tweak for your own database. No matter what language you write your database queries in, you will have to wait for this data to return before you can light up the LEDs. The performance of the querying code can make a difference though, and the Django ORM isn't lightning fast (but, usually fast enough).
Minimise setup/teardown time. Have a constantly running process, and send messages to it. (Correct me if I’m wrong, but this is what your original question is actually focusing on.) Whether this process is written in Python/Django or another language/framework is covered above. I don’t like the idea of using management commands so frequently. Is it possible to have a small piece of code running constantly, that pushes messages from the NFC readers onto the message queue, which Celery then reads and forwards to Django? The setup and teardown of a small program, even if it’s written in Python (but not Django!), should be better than starting and stopping a Django program (with all of its subsystems).

I’m unsure what web server you’re using for Django. mod_wsgi for Apache lets you configure the number of processes and threads within the processes that service requests. Be sure to tweak the relevant configuration of your web server to optimise the number of serviceable requests.

2. “Message-passing” with Django signals

Your second use case is also fairly interesting; I’m not sure if I have the answers for that. If you’re deleting model instances, and wish to operate on them later, it might be possible to serialize them JSON.dumps and then deserialize JSON.loads. It’ll be impossible to fully recreate the object graph later on (querying related models), since related fields are lazy loaded from the database, and that link will no longer exist.

The other option would be to somehow mark an object for deletion, and only delete it at the end of the request/response cycle (after all signals have been serviced). It might require a custom signal to implement this, rather than relying on post_delete.