Everything in Python is an object, including classes.
This means you can reference classes, passing them around like arguments, store them in attributes, (extra) names, lists, dictionaries, etc.
This is perfectly normal in Python:
class_map = {
'foo': A,
'bar': SomeOtherClass,
'baz': YetAnother,
}
instance = class_map[some_variable]()
Now it depends on some_variable
what class was picked to create an instance; the class_map
dictionary values are all classes.
Classes are themselves instances of their type; this is called their metaclass. You can produce a new class by calling type()
with a name, a sequence of base classes, and a mapping defining the attributes of the class:
type('DynamicClass', (), {'foo': 'bar'})
creates a new class object, with a foo
attribute set to bar
, for example. The class produced can itself then be used to create instances. So classes are produced by metaclasses, just as instances are produced by classes.
You can produce your own metaclasses by inheriting from type
, opening a weird and wonderful world of class behaviour.
Calling an unbound method by passing in a separate instance is not really a good example of using classes as an object. All you did was use the initial reference (the class name) to look up the method, then pass in an instance of the class as the first parameter to stand in for self
.
I think it's good to go back to the actual definition of the Liskov substitution principle:
Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.
— Liskov substitution principle
Note, the principle only refers to properties of objects. And, implicitly, only public properties, because the Liskov substitution principle is interested in behavioral typing — typing according to observable properties.
With that in mind…
Answer 1
With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP?
There are two parts to this. First, __init__
is not a constuctor. __new__
is the constructor — the method that actually constructs a new class instance from whole cloth. __init__
is just a private method that is called by __new__
on the new instance. And since it's private, it's not part of your type and not subject to the Liskov substitution principle.
What about __new__
then? All the parameters of __init__
are by default implicitly parameters of __new__
, so am I just kicking the can down the road? __new__
is a static method, so it's not a property of an instance of object
— it's part of object
itself. So it's not subject to the Liskov substitution principle for instance of object
either.
(Answer 1 Digression)
Here's where it gets kind of interesting (to me, at least). object
is a class, but in Python classes are objects. They're instances of type
, which is called their metaclass. So while __new__
is a static method, that means it's an instance method of object
itself.1 So, it is subject to the Liskov substitution principle for instance of type
. And if we look at the definition of __new__
in type
, we see:
__new__(*args, **kwargs) method of builtins.type instance
Create and return a new object. See help(type) for accurate signature.
So type
's __new__
accepts any and all arguments. Since many classes __init__
methods — and thus their __new__
methods — don't accept arbitrary arguments, those class objects are kind of in violation of the Liskov substitution method as instances of type
. But… as you pointed out later in your question,
New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype.
— Liskov substitution principle
And that's exactly what __new__
does. If you call type.__new__
with different arguments than it expects, it throws a TypeError
:
>>> type.__new__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: type.__new__(): not enough arguments
Which means that all subtypes of type
(i.e., all class objects) are free to throw their own TypeError
s in __new__
, and callers are obligated to handle it. And that's exactly what object.__new__
does, but under different conditions:
>>> object.__new__(object, 'foo', (), {}) # This would be valid for type.__new__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object() takes no arguments
So, by weakening the base type's precondition (the argument list) and opting to instead validate “at runtime” by throwing an exception, the __new__
method is able to meet the Liskov substitution principle as a property of instances of type
.2
This means we really can't just instantiate any arbitrary type in Python (either by calling __new__
or by just calling the type) without knowing the target type, unless we're prepared to catch and handle TypeError
s — and I think that tracks with most programmers' intuitions.
Answer 2
collections.Counter
, however, is a direct subclass of dict
. While it is mostly an extension of dict
rather than a modification of behaviours already defined in dict
, this isn't true for collection.Counter.fromkeys
.
The answer here is similar to the previous answer. Since fromkeys
is a class method, it's not really a property of instances of dict
and not subject to the Liskov substitution principle.3
But then, what about if we look at dict
as a class object? Do we run into the same complications we did with object.__new__
? No, we don't, because Counter
and dict
don't have any sort of hierarchical relationship as class objects — they're both direct instances of type
. We can't assume anything about their fromkeys
methods because they didn't inherit them from type
.
On the other hand, in Python a class does inherit all its parents' properties, which includes static and class methods like fromkeys
. So Counter
has to do something with fromkeys
. It could attempt to hide the method, e.g., by replacing it with a descriptor that always throws an AttributeError
, or even just by setting the property to None
. The author of Counter
chose to keep the method visible and to throw NotImplementedError
instead, perhaps to signal that the method is intentionally unusable.4
In the end, the Liskov substitution principle is just an attempt to formalize something very intuitive: don't surprise the users of your code. In that sense, it may be seen a necessary condition for “good code” (whatever that is), but not a sufficient condition.
1 This is a slight lie. __new__
is not an instance method of object
because it doesn't take the receiver (cls
, a.k.a. self
) as a parameter. It's a static method, so it's essentially just a function property of object
— but it doesn't make a difference to the discussion.
2 I put “at runtime” in scare quotes because technically everything in Python is at runtime, but hopefully the distinction is clear.
3 It is possible to call static and class methods through instances, e.g.,
d1 = dict()
d2 = d1.fromkeys(itr)
This just dispatches the method to type(d1)
, which is dict
. As far as I know, it's pretty well accepted that this is a quirk of Python, and we still think of those methods as properties of the type and not properties of the instance. But I suppose, in the strictest sense, that is a violation of the Liskov substitution principle.
4 According to the docs for NotImplementedError
, this is exactly how that exception is not meant to be used.
Note: It should not be used to indicate that an operator or method is not meant to be supported at all — in that case either leave the operator/method undefined or, if a subclass, set it to None
.
But, I suppose the standard library is allowed to contradict itself.
Best Answer
Properly structured is subjective. :)
Generally, the more moving parts you have (i.e. mutable state), the harder it is to reason about your code. There are more pieces to fit in your mental model of the code, but there are additional concerns as well, such as ensuring the derived values stay coherent when a value is updated, or or that your code is correct if ran in a concurrent context. Therefore, I would consider leaving the calculation of the derived attributes in a method as a good default since it minimizes the state required.
Not all calculations are equal, however. It might be too computationally expensive to re-calculate the derived values every time they are needed (e.g. in a tight loop), at which time you might want to consider caching. The good thing is that if the computation is hidden in a method, caching simply becomes an implementation detail, leaving the rest of your code unaware of this optimization. If your use case warrants it, you may even use the Decorator pattern to implement this caching, decoupling the actual, interesting calculation from the technical details of caching.
More specifically for Python, you can define your derived value as a
property
instead of a method. You have the benefits of keeping the state to the minimum while still seemingly using an attribute (if that makes sense in your use case). That being said, a property is generally expected not to perform an expensive operation, so if that is your case, I would generally prefer to keep the method.