In Python, what are metaclasses and what do we use them for?
Python – What are metaclasses in Python
metaclassooppythonpython-classpython-datamodel
Related Solutions
How can I merge two Python dictionaries in a single expression?
For dictionaries x
and y
, z
becomes a shallowly-merged dictionary with values from y
replacing those from x
.
In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:
z = x | y # NOTE: 3.9+ ONLY
In Python 3.5 or greater:
z = {**x, **y}
In Python 2, (or 3.4 or lower) write a function:
def merge_two_dicts(x, y): z = x.copy() # start with keys and values of x z.update(y) # modifies z with keys and values of y return z
and now:
z = merge_two_dicts(x, y)
Explanation
Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:
x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}
The desired result is to get a new dictionary (z
) with the values merged, and the second dictionary's values overwriting those from the first.
>>> z
{'a': 1, 'b': 3, 'c': 4}
A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is
z = {**x, **y}
And it is indeed a single expression.
Note that we can merge in with literal notation as well:
z = {**x, 'foo': 1, 'bar': 2, **y}
and now:
>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}
It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What's New in Python 3.5 document.
However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:
z = x.copy()
z.update(y) # which returns None since it mutates z
In both approaches, y
will come second and its values will replace x
's values, thus b
will point to 3
in our final result.
Not yet on Python 3.5, but want a single expression
If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:
def merge_two_dicts(x, y):
"""Given two dictionaries, merge them into a new dict as a shallow copy."""
z = x.copy()
z.update(y)
return z
and then you have a single expression:
z = merge_two_dicts(x, y)
You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:
def merge_dicts(*dict_args):
"""
Given any number of dictionaries, shallow copy and merge into a new dict,
precedence goes to key-value pairs in latter dictionaries.
"""
result = {}
for dictionary in dict_args:
result.update(dictionary)
return result
This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a
to g
:
z = merge_dicts(a, b, c, d, e, f, g)
and key-value pairs in g
will take precedence over dictionaries a
to f
, and so on.
Critiques of Other Answers
Don't use what you see in the formerly accepted answer:
z = dict(x.items() + y.items())
In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you're adding two dict_items
objects together, not two lists -
>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'
and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items()))
. This is a waste of resources and computation power.
Similarly, taking the union of items()
in Python 3 (viewitems()
in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:
>>> c = dict(a.items() | b.items())
This example demonstrates what happens when values are unhashable:
>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Here's an example where y
should have precedence, but instead the value from x
is retained due to the arbitrary order of sets:
>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}
Another hack you should not use:
z = dict(x, **y)
This uses the dict
constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it's difficult to read, it's not the intended usage, and so it is not Pythonic.
Here's an example of the usage being remediated in django.
Dictionaries are intended to take hashable keys (e.g. frozenset
s or tuples), but this method fails in Python 3 when keys are not strings.
>>> c = dict(a, **b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings
From the mailing list, Guido van Rossum, the creator of the language, wrote:
I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.
and
Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.
It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y)
is for creating dictionaries for readability purposes, e.g.:
dict(a=1, b=10, c=11)
instead of
{'a': 1, 'b': 10, 'c': 11}
Response to comments
Despite what Guido says,
dict(x, **y)
is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.
Again, it doesn't work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict
broke this consistency in Python 2:
>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}
This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.
I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.
More comments:
dict(x.items() + y.items())
is still the most readable solution for Python 2. Readability counts.
My response: merge_two_dicts(x, y)
actually seems much clearer to me, if we're actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.
{**x, **y}
does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.
Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first's values being overwritten by the second's - in a single expression.
Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:
from copy import deepcopy
def dict_of_dicts_merge(x, y):
z = {}
overlapping_keys = x.keys() & y.keys()
for key in overlapping_keys:
z[key] = dict_of_dicts_merge(x[key], y[key])
for key in x.keys() - overlapping_keys:
z[key] = deepcopy(x[key])
for key in y.keys() - overlapping_keys:
z[key] = deepcopy(y[key])
return z
Usage:
>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}
Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".
Less Performant But Correct Ad-hocs
These approaches are less performant, but they will provide correct behavior.
They will be much less performant than copy
and update
or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)
You can also chain the dictionaries manually inside a dict comprehension:
{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7
or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):
dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2
itertools.chain
will chain the iterators over the key-value pairs in the correct order:
from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2
Performance Analysis
I'm only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)
from timeit import repeat
from itertools import chain
x = dict.fromkeys('abcdefg')
y = dict.fromkeys('efghijk')
def merge_two_dicts(x, y):
z = x.copy()
z.update(y)
return z
min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
In Python 3.8.1, NixOS:
>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux
Resources on Dictionaries
- My explanation of Python's dictionary implementation, updated for 3.6.
- Answer on how to add new keys to a dictionary
- Mapping two lists into a dictionary
- The official Python docs on dictionaries
- The Dictionary Even Mightier - talk by Brandon Rhodes at Pycon 2017
- Modern Python Dictionaries, A Confluence of Great Ideas - talk by Raymond Hettinger at Pycon 2017
Variables declared inside the class definition, but not inside a method are class or static variables:
>>> class MyClass:
... i = 3
...
>>> MyClass.i
3
As @millerdev points out, this creates a class-level i
variable, but this is distinct from any instance-level i
variable, so you could have
>>> m = MyClass()
>>> m.i = 4
>>> MyClass.i, m.i
>>> (3, 4)
This is different from C++ and Java, but not so different from C#, where a static member can't be accessed using a reference to an instance.
See what the Python tutorial has to say on the subject of classes and class objects.
@Steve Johnson has already answered regarding static methods, also documented under "Built-in Functions" in the Python Library Reference.
class C:
@staticmethod
def f(arg1, arg2, ...): ...
@beidy recommends classmethods over staticmethod, as the method then receives the class type as the first argument, but I'm still a little fuzzy on the advantages of this approach over staticmethod. If you are too, then it probably doesn't matter.
Related Topic
- Python – Difference between staticmethod and classmethod
- Python – What does the “yield” keyword do
- Python – How to safely create a nested directory in Python
- Python – What does if __name__ == “__main__”: do
- Python – __init__.py for
- Python – the difference between __str__ and __repr__
- Python – How to list all files of a directory
- Python – Does Python have a string ‘contains’ substring method
Best Answer
Classes as objects
Before understanding metaclasses, you need to master classes in Python. And Python has a very peculiar idea of what classes are, borrowed from the Smalltalk language.
In most languages, classes are just pieces of code that describe how to produce an object. That's kinda true in Python too:
But classes are more than that in Python. Classes are objects too.
Yes, objects.
As soon as you use the keyword
class
, Python executes it and creates an object. The instructioncreates in memory an object with the name
ObjectCreator
.This object (the class) is itself capable of creating objects (the instances), and this is why it's a class.
But still, it's an object, and therefore:
e.g.:
Creating classes dynamically
Since classes are objects, you can create them on the fly, like any object.
First, you can create a class in a function using
class
:But it's not so dynamic, since you still have to write the whole class yourself.
Since classes are objects, they must be generated by something.
When you use the
class
keyword, Python creates this object automatically. But as with most things in Python, it gives you a way to do it manually.Remember the function
type
? The good old function that lets you know what type an object is:Well,
type
has a completely different ability, it can also create classes on the fly.type
can take the description of a class as parameters, and return a class.(I know, it's silly that the same function can have two completely different uses according to the parameters you pass to it. It's an issue due to backward compatibility in Python)
type
works this way:Where:
name
: name of the classbases
: tuple of the parent class (for inheritance, can be empty)attrs
: dictionary containing attributes names and valuese.g.:
can be created manually this way:
You'll notice that we use
MyShinyClass
as the name of the class and as the variable to hold the class reference. They can be different, but there is no reason to complicate things.type
accepts a dictionary to define the attributes of the class. So:Can be translated to:
And used as a normal class:
And of course, you can inherit from it, so:
would be:
Eventually, you'll want to add methods to your class. Just define a function with the proper signature and assign it as an attribute.
And you can add even more methods after you dynamically create the class, just like adding methods to a normally created class object.
You see where we are going: in Python, classes are objects, and you can create a class on the fly, dynamically.
This is what Python does when you use the keyword
class
, and it does so by using a metaclass.What are metaclasses (finally)
Metaclasses are the 'stuff' that creates classes.
You define classes in order to create objects, right?
But we learned that Python classes are objects.
Well, metaclasses are what create these objects. They are the classes' classes, you can picture them this way:
You've seen that
type
lets you do something like this:It's because the function
type
is in fact a metaclass.type
is the metaclass Python uses to create all classes behind the scenes.Now you wonder "why the heck is it written in lowercase, and not
Type
?"Well, I guess it's a matter of consistency with
str
, the class that creates strings objects, andint
the class that creates integer objects.type
is just the class that creates class objects.You see that by checking the
__class__
attribute.Everything, and I mean everything, is an object in Python. That includes integers, strings, functions and classes. All of them are objects. And all of them have been created from a class:
Now, what is the
__class__
of any__class__
?So, a metaclass is just the stuff that creates class objects.
You can call it a 'class factory' if you wish.
type
is the built-in metaclass Python uses, but of course, you can create your own metaclass.The
__metaclass__
attributeIn Python 2, you can add a
__metaclass__
attribute when you write a class (see next section for the Python 3 syntax):If you do so, Python will use the metaclass to create the class
Foo
.Careful, it's tricky.
You write
class Foo(object)
first, but the class objectFoo
is not created in memory yet.Python will look for
__metaclass__
in the class definition. If it finds it, it will use it to create the object classFoo
. If it doesn't, it will usetype
to create the class.Read that several times.
When you do:
Python does the following:
Is there a
__metaclass__
attribute inFoo
?If yes, create in-memory a class object (I said a class object, stay with me here), with the name
Foo
by using what is in__metaclass__
.If Python can't find
__metaclass__
, it will look for a__metaclass__
at the MODULE level, and try to do the same (but only for classes that don't inherit anything, basically old-style classes).Then if it can't find any
__metaclass__
at all, it will use theBar
's (the first parent) own metaclass (which might be the defaulttype
) to create the class object.Be careful here that the
__metaclass__
attribute will not be inherited, the metaclass of the parent (Bar.__class__
) will be. IfBar
used a__metaclass__
attribute that createdBar
withtype()
(and nottype.__new__()
), the subclasses will not inherit that behavior.Now the big question is, what can you put in
__metaclass__
?The answer is something that can create a class.
And what can create a class?
type
, or anything that subclasses or uses it.Metaclasses in Python 3
The syntax to set the metaclass has been changed in Python 3:
i.e. the
__metaclass__
attribute is no longer used, in favor of a keyword argument in the list of base classes.The behavior of metaclasses however stays largely the same.
One thing added to metaclasses in Python 3 is that you can also pass attributes as keyword-arguments into a metaclass, like so:
Read the section below for how Python handles this.
Custom metaclasses
The main purpose of a metaclass is to change the class automatically, when it's created.
You usually do this for APIs, where you want to create classes matching the current context.
Imagine a stupid example, where you decide that all classes in your module should have their attributes written in uppercase. There are several ways to do this, but one way is to set
__metaclass__
at the module level.This way, all classes of this module will be created using this metaclass, and we just have to tell the metaclass to turn all attributes to uppercase.
Luckily,
__metaclass__
can actually be any callable, it doesn't need to be a formal class (I know, something with 'class' in its name doesn't need to be a class, go figure... but it's helpful).So we will start with a simple example, by using a function.
Let's check:
Now, let's do exactly the same, but using a real class for a metaclass:
Let's rewrite the above, but with shorter and more realistic variable names now that we know what they mean:
You may have noticed the extra argument
cls
. There is nothing special about it:__new__
always receives the class it's defined in, as the first parameter. Just like you haveself
for ordinary methods which receive the instance as the first parameter, or the defining class for class methods.But this is not proper OOP. We are calling
type
directly and we aren't overriding or calling the parent's__new__
. Let's do that instead:We can make it even cleaner by using
super
, which will ease inheritance (because yes, you can have metaclasses, inheriting from metaclasses, inheriting from type):Oh, and in Python 3 if you do this call with keyword arguments, like this:
It translates to this in the metaclass to use it:
That's it. There is really nothing more about metaclasses.
The reason behind the complexity of the code using metaclasses is not because of metaclasses, it's because you usually use metaclasses to do twisted stuff relying on introspection, manipulating inheritance, vars such as
__dict__
, etc.Indeed, metaclasses are especially useful to do black magic, and therefore complicated stuff. But by themselves, they are simple:
Why would you use metaclasses classes instead of functions?
Since
__metaclass__
can accept any callable, why would you use a class since it's obviously more complicated?There are several reasons to do so:
UpperAttrMetaclass(type)
, you know what's going to follow__new__
,__init__
and__call__
. Which will allow you to do different stuff, Even if usually you can do it all in__new__
, some people are just more comfortable using__init__
.Why would you use metaclasses?
Now the big question. Why would you use some obscure error-prone feature?
Well, usually you don't:
Python Guru Tim Peters
The main use case for a metaclass is creating an API. A typical example of this is the Django ORM. It allows you to define something like this:
But if you do this:
It won't return an
IntegerField
object. It will return anint
, and can even take it directly from the database.This is possible because
models.Model
defines__metaclass__
and it uses some magic that will turn thePerson
you just defined with simple statements into a complex hook to a database field.Django makes something complex look simple by exposing a simple API and using metaclasses, recreating code from this API to do the real job behind the scenes.
The last word
First, you know that classes are objects that can create instances.
Well, in fact, classes are themselves instances. Of metaclasses.
Everything is an object in Python, and they are all either instance of classes or instances of metaclasses.
Except for
type
.type
is actually its own metaclass. This is not something you could reproduce in pure Python, and is done by cheating a little bit at the implementation level.Secondly, metaclasses are complicated. You may not want to use them for very simple class alterations. You can change classes by using two different techniques:
99% of the time you need class alteration, you are better off using these.
But 98% of the time, you don't need class alteration at all.