Python – How to pass complex ORM objects to background workers

coding-styleormpython

I've been wondering something about how we pass complex ORM objects to background workers — in my case, Django models and Celery, but this could apply to any similar background processing framework.

I usually pass model objects "by reference" using the primary key. In quick Python, this would be how I send an email for an user :

@app.task
def background_task(user_id):
    user = User.objects.get(id=user_id)
    send_email(user.first_name, user.last_name)

background_task.delay(user.id)

A coworker instead prefers to pass values directly to the task :

@app.task
def background_task(user_first_name, user_last_name):
    send_email(user_first_name, user_last_name)

background_task.delay(user.first_name, user.last_name)

I find the first method much easier to maintain :

  • We have tasks using the second paradigm taking 10+ arguments coming from a single object, and these are quite hard to read and reuse because one needs to make sure that all the arguments are accounted for.

  • Adding more information to the task means that you need to add more arguments, and make sure that they are also added to all the places that use this task, which is time-consuming and error-prone.

The second method has the following advantages :

  • You do not run the risk of using the wrong information. For instance, with the first method, a model could be updated twice before the task can run, resulting in the first update not being processed.
  • You also do not run a second query when the task runs since task arguments are stored along with the task, which could lighten the load on the database.

I am rather biased towards the first method, but neither of us were able to reach a conclusion on this.

  • Are there advantages/inconvenients that I did not list?
  • Can you think of some projects using either of these that we could use as examples?
  • How should we handle this?

Best Answer

To pass model object in celery task you can use django's inbuilt serializer
https://docs.djangoproject.com/en/2.1/topics/serialization/

from django.core import serializers
data = serializers.serialize("json", SomeModel.objects.all())

Pass this serialised data as argument to celery task and there you can deserialise it back

for obj in serializers.deserialize("json", data):
    do_something_with(obj)
Related Topic