Python – Get a unique ID for worker in python multiprocessing pool

multiprocessingpython

Is there a way to assign each worker in a python multiprocessing pool a unique ID in a way that a job being run by a particular worker in the pool could know which worker is running it? According to the docs, a Process has a name but

The name is a string used for identification purposes only. It has no
semantics. Multiple processes may be given the same name.

For my particular use-case, I want to run a bunch of jobs on a group of four GPUs, and need to set the device number for the GPU that the job should run on. Because the jobs are of non-uniform length, I want to be sure that I don't have a collision on a GPU of a job trying to run on it before the previous one completes (so this precludes pre-assigning an ID to the unit of work ahead of time).

Best Answer

It seems like what you want is simple: multiprocessing.current_process(). For example:

import multiprocessing

def f(x):
    print multiprocessing.current_process()
    return x * x

p = multiprocessing.Pool()
print p.map(f, range(6))

Output:

$ python foo.py 
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-3, started daemon)>
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-4, started daemon)>
[0, 1, 4, 9, 16, 25]

This returns the process object itself, so the process can be its own identity. You could also call id on it for a unique numerical id -- in cpython, this is the memory address of the process object, so I don't think there's any possibility of overlap. Finally, you can use the ident or the pid property of the process -- but that's only set once the process is started.

Furthermore, looking over the source, it seems to me very likely that autogenerated names (as exemplified by the first value in the Process repr strings above) are unique. multiprocessing maintains an itertools.counter object for every process, which is used to generate an _identity tuple for any child processes it spawns. So the top-level process produces child process with single-value ids, and they spawn process with two-value ids, and so on. Then, if no name is passed to the Process constructor, it simply autogenerates the name based on the _identity, using ':'.join(...). Then Pool alters the name of the process using replace, leaving the autogenerated id the same.

The upshot of all this is that although two Processes may have the same name, because you may assign the same name to them when you create them, they are unique if you don't touch the name parameter. Also, you could theoretically use _identity as a unique identifier; but I gather they made that variable private for a reason!

An example of the above in action:

import multiprocessing

def f(x):
    created = multiprocessing.Process()
    current = multiprocessing.current_process()
    print 'running:', current.name, current._identity
    print 'created:', created.name, created._identity
    return x * x

p = multiprocessing.Pool()
print p.map(f, range(6))

Output:

$ python foo.py 
running: PoolWorker-1 (1,)
created: Process-1:1 (1, 1)
running: PoolWorker-2 (2,)
created: Process-2:1 (2, 1)
running: PoolWorker-3 (3,)
created: Process-3:1 (3, 1)
running: PoolWorker-1 (1,)
created: Process-1:2 (1, 2)
running: PoolWorker-2 (2,)
created: Process-2:2 (2, 2)
running: PoolWorker-4 (4,)
created: Process-4:1 (4, 1)
[0, 1, 4, 9, 16, 25]

Related Solutions

Python – Replacements for switch statement in Python

The original answer below was written in 2008. Since then, Python 3.10 (2021) introduced the match-case statement which provides a first-class implementation of a "switch" for Python. For example:

def f(x):
    match x:
        case 'a':
            return 1
        case 'b':
            return 2
        case _:        
            return 0   # 0 is the default case if x is not found

The match-case statement is considerably more powerful than this simple example.

You could use a dictionary:

def f(x):
    return {
        'a': 1,
        'b': 2,
    }[x]

Python – What’s the canonical way to check for type in Python

To check if o is an instance of str or any subclass of str, use isinstance (this would be the "canonical" way):

if isinstance(o, str):

To check if the type of o is exactly str (exclude subclasses):

if type(o) is str:

The following also works, and can be useful in some cases:

if issubclass(type(o), str):

See Built-in Functions in the Python Library Reference for relevant information.

One more note: in this case, if you're using Python 2, you may actually want to use:

if isinstance(o, basestring):

because this will also catch Unicode strings (unicode is not a subclass of str; both str and unicode are subclasses of basestring). Note that basestring no longer exists in Python 3, where there's a strict separation of strings (str) and binary data (bytes).

Alternatively, isinstance accepts a tuple of classes. This will return True if o is an instance of any subclass of any of (str, unicode):

if isinstance(o, (str, unicode)):

Best Answer

Related Solutions

Python – Replacements for switch statement in Python

Python – What’s the canonical way to check for type in Python

Related Topic