I want to use Celery to run jobs on a GPU server with four Tesla cards. I run the Celery worker with a pool of four workers such that each card always runs one job.
My problem is how to instruct the workers to each claim one GPU. Currently I rely on the assumption that the worker processes should all have contiguous process IDs:
device_id = os.getpid() % self.ndevices
However, I this is not guaranteed to always work, i.e. when worker processes get restarted over time. So ideally, I would like to get the ID of each worker directly. Can someone tell me if it is possible to inspect the worker from within a task or can suggest a different solution to distribute the jobs across the GPUs?
Best Answer
If you are using
CELERYD_POOL = 'processes'
, the worker pool is handled bybilliard
, which does happen to expose its 0-based process index:The
index
is 0-based, and if a worker happens to be restarted it will keep its index.I couldn't find any documentation regarding the
index
value though :/