Erlang vs Python – Is Erlang Worth It Compared to Python with GIL?

erlangmultithreadingperformancepython

So, I've finally gotten myself to a point where I'm comfortable enough with Python (using Pyramid as my framework of choice) to undertake a rather large personal project. As it's a personal project, I have the luxury of taking my time having absolutely no deadlines other than self-imposed ones.

I love learning new frameworks, languages, etc. so if given a reason, I don't mind pushing back on development for a month or so while learning a new language and framework (takes longer when you're doing it on your own time ;)).

I recently learned about CPython's GIL (Global Interpreter Lock), which raised my eyebrow a bit. If I understand it correctly, this means that if I have a Queue in my web app and have threads that complete jobs in the queue, then the code is locked until the thread for that request is complete, meaning that any subsequent request is locked while the previous thread is executing.

Question:

Has anyone in "real world" applications found this to be a problem? Is it worthwhile to learn a language that supports real concurrency out of the box, such as Erlang? I'm most interested in any benchmarks that anyone has done in real world apps and whether or not anyone has seen any real noticeable issues with the GIL.

Best Answer

I have run up against the GIL in server side programming in almost every instance where I need something to scale to millions of concurrent users on multiple core machines.

Python is great for command line tools and things that don't need true concurrency to extract every last bit of performance from a given piece of hardware.

But for things that really need to squeeze everything out of something like a Sun T2000, you don't want to write anything in Python, it will be a operational maintenance nightmare running 32 separate processes and trying to management them all.

I abandoned Twisted in favor of Erlang a few years ago, Python just doesn't cut it in the large scale concurrency space. The transparent distributed nature of Erlang means it scales horizontally as well as vertically.

No side effects

This means, when you give a function a piece of data to execute against, it will not except in very strict cases affect anything else in the system/running process. This means that if you execute a function 300 times all at once concurrently, none of those 300 executions of the function will effect any of the others.

The implementation technique for ensuring no side effects is called "immutability" which roughly means, may not be mutated(changed). This means that as soon as you create a variable, the value of that variable may not be modified. Erlang implements this behavior with "single assignment" so after you assign a value to a variable, you may not assign a value to it again.

X = 1.
X = 2. // This is not a valid operation

This ensures no code may accidentally change the value of X causing a race condition, therefore it is inherently thread-safe and concurrent use becomes trivial. This is a very uncommon behavior among software languages and the biggest way Erlang manages to be so well suited for concurrent execution.

The actor model

This is a particular way of modelling that has shown to make the implementation and management of concurrent processing very simple for developers. Straight from Wikipedia:

The Actor model adopts the philosophy that everything is an actor. This is similar to the everything is an object philosophy used by some object-oriented programming languages, but differs in that object-oriented software is typically executed sequentially, while the Actor model is inherently concurrent. An actor is a computational entity that, in response to a message it receives, can concurrently: send a finite number of messages to other actors; create a finite number of new actors; designate the behavior to be used for the next message it receives. There is no assumed sequence to the above actions and they could be carried out in parallel. Decoupling the sender from communications sent was a fundamental advance of the Actor model enabling asynchronous communication and control structures as patterns of passing messages.

Python GIL – Why It Was Implemented

There are several implementations of Python, for example, CPython, IronPython, RPython, etc.

Some of them have a GIL, some don't. For example, CPython has the GIL:

From http://en.wikipedia.org/wiki/Global_Interpreter_Lock

Applications written in programming languages with a GIL can be designed to use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL.

Benefits of the GIL

Increased speed of single-threaded programs.
Easy integration of C libraries that usually are not thread-safe.

Why Python (CPython and others) uses the GIL

From http://wiki.python.org/moin/GlobalInterpreterLock

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.

The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

From http://www.grouplens.org/node/244

Python has a GIL as opposed to fine-grained locking for several reasons:

It is faster in the single-threaded case.
It is faster in the multi-threaded case for i/o bound programs.
It is faster in the multi-threaded case for cpu-bound programs that do their compute-intensive work in C libraries.
It makes C extensions easier to write: there will be no switch of Python threads except where you allow it to happen (i.e. between the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros).
It makes wrapping C libraries easier. You don't have to worry about thread-safety. If the library is not thread-safe, you simply keep the GIL locked while you call it.

The GIL can be released by C extensions. Python's standard library releases the GIL around each blocking i/o call. Thus the GIL has no consequence for performance of i/o bound servers. You can thus create networking servers in Python using processes (fork), threads or asynchronous i/o, and the GIL will not get in your way.

Numerical libraries in C or Fortran can similarly be called with the GIL released. While your C extension is waiting for an FFT to complete, the interpreter will be executing other Python threads. A GIL is thus easier and faster than fine-grained locking in this case as well. This constitutes the bulk of numerical work. The NumPy extension releases the GIL whenever possible.

Threads are usually a bad way to write most server programs. If the load is low, forking is easier. If the load is high, asynchronous i/o and event-driven programming (e.g. using Python's Twisted framework) is better. The only excuse for using threads is the lack of os.fork on Windows.

The GIL is a problem if, and only if, you are doing CPU-intensive work in pure Python. Here you can get cleaner design using processes and message-passing (e.g. mpi4py). There is also a 'processing' module in Python cheese shop, that gives processes the same interface as threads (i.e. replace threading.Thread with processing.Process).

Threads can be used to maintain responsiveness of a GUI regardless of the GIL. If the GIL impairs your performance (cf. the discussion above), you can let your thread spawn a process and wait for it to finish.

Best Answer

Related Solutions

Why Erlang is Suitable for Cloud Applications

No side effects

The actor model

Python GIL – Why It Was Implemented

Related Topic