Other than the fact that Erlang was specifically developed to be run in concurrent/parallelized/distributed situations, the two main techniques that it employs making this possible are:
No side effects
This means, when you give a function a piece of data to execute against, it will not except in very strict cases affect anything else in the system/running process. This means that if you execute a function 300 times all at once concurrently, none of those 300 executions of the function will effect any of the others.
The implementation technique for ensuring no side effects is called "immutability" which roughly means, may not be mutated(changed). This means that as soon as you create a variable, the value of that variable may not be modified. Erlang implements this behavior with "single assignment" so after you assign a value to a variable, you may not assign a value to it again.
X = 1.
X = 2. // This is not a valid operation
This ensures no code may accidentally change the value of X causing a race condition, therefore it is inherently thread-safe and concurrent use becomes trivial. This is a very uncommon behavior among software languages and the biggest way Erlang manages to be so well suited for concurrent execution.
The actor model
This is a particular way of modelling that has shown to make the implementation and management of concurrent processing very simple for developers. Straight from Wikipedia:
The Actor model adopts the philosophy that everything is an actor.
This is similar to the everything is an object philosophy used by some
object-oriented programming languages, but differs in that
object-oriented software is typically executed sequentially, while the
Actor model is inherently concurrent. An actor is a computational
entity that, in response to a message it receives, can concurrently:
send a finite number of messages to other actors; create a finite
number of new actors; designate the behavior to be used for the next
message it receives. There is no assumed sequence to the above actions
and they could be carried out in parallel. Decoupling the sender from
communications sent was a fundamental advance of the Actor model
enabling asynchronous communication and control structures as patterns
of passing messages.
There are several implementations of Python, for example, CPython, IronPython, RPython, etc.
Some of them have a GIL, some don't. For example, CPython has the GIL:
From http://en.wikipedia.org/wiki/Global_Interpreter_Lock
Applications written in programming languages with a GIL can be designed to use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL.
Benefits of the GIL
- Increased speed of single-threaded programs.
- Easy integration of C libraries that usually are not thread-safe.
Why Python (CPython and others) uses the GIL
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.
The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.
Python has a GIL as opposed to fine-grained locking for several reasons:
It is faster in the single-threaded case.
It is faster in the multi-threaded case for i/o bound programs.
It is faster in the multi-threaded case for cpu-bound programs that do their compute-intensive work in C libraries.
It makes C extensions easier to write: there will be no switch of Python threads except where you allow it to happen (i.e. between the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros).
It makes wrapping C libraries easier. You don't have to worry about thread-safety. If the library is not thread-safe, you simply keep the GIL locked while you call it.
The GIL can be released by C extensions. Python's standard library releases the GIL around each blocking i/o call. Thus the GIL has no consequence for performance of i/o bound servers. You can thus create networking servers in Python using processes (fork), threads or asynchronous i/o, and the GIL will not get in your way.
Numerical libraries in C or Fortran can similarly be called with the GIL released. While your C extension is waiting for an FFT to complete, the interpreter will be executing other Python threads. A GIL is thus easier and faster than fine-grained locking in this case as well. This constitutes the bulk of numerical work. The NumPy extension releases the GIL whenever possible.
Threads are usually a bad way to write most server programs. If the load is low, forking is easier. If the load is high, asynchronous i/o and event-driven programming (e.g. using Python's Twisted framework) is better. The only excuse for using threads is the lack of os.fork on Windows.
The GIL is a problem if, and only if, you are doing CPU-intensive work in pure Python. Here you can get cleaner design using processes and message-passing (e.g. mpi4py). There is also a 'processing' module in Python cheese shop, that gives processes the same interface as threads (i.e. replace threading.Thread with processing.Process).
Threads can be used to maintain responsiveness of a GUI regardless of the GIL. If the GIL impairs your performance (cf. the discussion above), you can let your thread spawn a process and wait for it to finish.
Best Answer
I have run up against the GIL in server side programming in almost every instance where I need something to scale to millions of concurrent users on multiple core machines.
Python is great for command line tools and things that don't need true concurrency to extract every last bit of performance from a given piece of hardware.
But for things that really need to squeeze everything out of something like a Sun T2000, you don't want to write anything in Python, it will be a operational maintenance nightmare running 32 separate processes and trying to management them all.
I abandoned Twisted in favor of Erlang a few years ago, Python just doesn't cut it in the large scale concurrency space. The transparent distributed nature of Erlang means it scales horizontally as well as vertically.