Liskov Substitution Principle in Python – Explanation and Examples

liskov-substitutionobject-orientedpython

Background

I've taught myself Python over the past year-and-a-bit, and would consider myself an intermediate Python user at this point, but never studied computing at school/university. As such, my knowledge of theory is a little weak. Python is the only language I know.

I'm trying to wrap my head around the Liskov Substitution Principle so that I can write better, more object-oriented code.

Question 1: The Python data model

As we know, in Python 3, all custom classes implicitly inherit from object. But the docstring of object includes this line:

When called, it accepts no arguments and returns a new featureless instance that has no instance attributes and cannot be given any.

With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP? If I have a class Foo, like so:

class Foo:
    def __init__(self, bar):
        self.bar = bar

then, surely this class definition (and all others like it) violates the contract specified by object? object and Foo cannot be used interchangeably, as object accepts exactly 0 parameters to its constructor, while Foo accepts exactly 1.

Question 2: collections.Counter

As Raymond Hettinger tells us, the Python dictionary is an excellent example of the open/closed principle. The dict class is "open for extension, closed for modification" — if you wish to modify some of the prepackaged behaviours of a dict, you're advised to inherit from collections.abc.MutableMapping or collections.UserDict instead.

collections.Counter, however, is a direct subclass of dict. While it is mostly an extension of dict rather than a modification of behaviours already defined in dict, this isn't true for collection.Counter.fromkeys. With a standard dict, this classmethod is an alternative constructor for a dictionary, but the method is overridden in Counter so that it raises an exception. Here is the comment explaining why this is the case:

# There is no equivalent method for counters because the semantics
# would be ambiguous in cases such as Counter.fromkeys('aaabbc', v=2).
# Initializing counters to zero values isn't necessary because zero
# is already the default value for counter lookups.  Initializing
# to one is easily accomplished with Counter(set(iterable)).  For
# more exotic cases, create a dictionary first using a dictionary
# comprehension or dict.fromkeys().

The explanation makes sense — but surely this violates the LSP? According to the Wikipedia article, the LSP states that "New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype." dict.fromkeys does not throw a NotImplementedError; Counter.fromkeys does.

Comments

I'm interested in whether these examples do, in fact, break the LSP. However, I'm also interested in why they break the LSP, if indeed they do. In what situations is enforcing the LSP necessary/advisable/useful? In what situations is worrying about the LSP more of a meaningless distraction? Etc.

Best Answer

I think it's good to go back to the actual definition of the Liskov substitution principle:

Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.

Liskov substitution principle

Note, the principle only refers to properties of objects. And, implicitly, only public properties, because the Liskov substitution principle is interested in behavioral typing — typing according to observable properties.

With that in mind…

Answer 1

With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP?

There are two parts to this. First, __init__ is not a constuctor. __new__ is the constructor — the method that actually constructs a new class instance from whole cloth. __init__ is just a private method that is called by __new__ on the new instance. And since it's private, it's not part of your type and not subject to the Liskov substitution principle.

What about __new__ then? All the parameters of __init__ are by default implicitly parameters of __new__, so am I just kicking the can down the road? __new__ is a static method, so it's not a property of an instance of object — it's part of object itself. So it's not subject to the Liskov substitution principle for instance of object either.

(Answer 1 Digression)

Here's where it gets kind of interesting (to me, at least). object is a class, but in Python classes are objects. They're instances of type, which is called their metaclass. So while __new__ is a static method, that means it's an instance method of object itself.1 So, it is subject to the Liskov substitution principle for instance of type. And if we look at the definition of __new__ in type, we see:

__new__(*args, **kwargs) method of builtins.type instance
    Create and return a new object.  See help(type) for accurate signature.

So type's __new__ accepts any and all arguments. Since many classes __init__ methods — and thus their __new__ methods — don't accept arbitrary arguments, those class objects are kind of in violation of the Liskov substitution method as instances of type. But… as you pointed out later in your question,

New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype.

Liskov substitution principle

And that's exactly what __new__ does. If you call type.__new__ with different arguments than it expects, it throws a TypeError:

>>> type.__new__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type.__new__(): not enough arguments

Which means that all subtypes of type (i.e., all class objects) are free to throw their own TypeErrors in __new__, and callers are obligated to handle it. And that's exactly what object.__new__ does, but under different conditions:

>>> object.__new__(object, 'foo', (), {})  # This would be valid for type.__new__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object() takes no arguments

So, by weakening the base type's precondition (the argument list) and opting to instead validate “at runtime” by throwing an exception, the __new__ method is able to meet the Liskov substitution principle as a property of instances of type.2

This means we really can't just instantiate any arbitrary type in Python (either by calling __new__ or by just calling the type) without knowing the target type, unless we're prepared to catch and handle TypeErrors — and I think that tracks with most programmers' intuitions.

Answer 2

collections.Counter, however, is a direct subclass of dict. While it is mostly an extension of dict rather than a modification of behaviours already defined in dict, this isn't true for collection.Counter.fromkeys.

The answer here is similar to the previous answer. Since fromkeys is a class method, it's not really a property of instances of dict and not subject to the Liskov substitution principle.3

But then, what about if we look at dict as a class object? Do we run into the same complications we did with object.__new__? No, we don't, because Counter and dict don't have any sort of hierarchical relationship as class objects — they're both direct instances of type. We can't assume anything about their fromkeys methods because they didn't inherit them from type.

On the other hand, in Python a class does inherit all its parents' properties, which includes static and class methods like fromkeys. So Counter has to do something with fromkeys. It could attempt to hide the method, e.g., by replacing it with a descriptor that always throws an AttributeError, or even just by setting the property to None. The author of Counter chose to keep the method visible and to throw NotImplementedError instead, perhaps to signal that the method is intentionally unusable.4

In the end, the Liskov substitution principle is just an attempt to formalize something very intuitive: don't surprise the users of your code. In that sense, it may be seen a necessary condition for “good code” (whatever that is), but not a sufficient condition.


1 This is a slight lie. __new__ is not an instance method of object because it doesn't take the receiver (cls, a.k.a. self) as a parameter. It's a static method, so it's essentially just a function property of object — but it doesn't make a difference to the discussion.

2 I put “at runtime” in scare quotes because technically everything in Python is at runtime, but hopefully the distinction is clear.

3 It is possible to call static and class methods through instances, e.g.,

d1 = dict()
d2 = d1.fromkeys(itr)

This just dispatches the method to type(d1), which is dict. As far as I know, it's pretty well accepted that this is a quirk of Python, and we still think of those methods as properties of the type and not properties of the instance. But I suppose, in the strictest sense, that is a violation of the Liskov substitution principle.

4 According to the docs for NotImplementedError, this is exactly how that exception is not meant to be used.

Note: It should not be used to indicate that an operator or method is not meant to be supported at all — in that case either leave the operator/method undefined or, if a subclass, set it to None.

But, I suppose the standard library is allowed to contradict itself.