Background
I've taught myself Python over the past year-and-a-bit, and would consider myself an intermediate Python user at this point, but never studied computing at school/university. As such, my knowledge of theory is a little weak. Python is the only language I know.
I'm trying to wrap my head around the Liskov Substitution Principle so that I can write better, more object-oriented code.
Question 1: The Python data model
As we know, in Python 3, all custom classes implicitly inherit from object
. But the docstring of object
includes this line:
When called, it accepts no arguments and returns a new featureless instance that has no instance attributes and cannot be given any.
With this in mind, how can any python classes that accept a nonzero number of parameters for their constructors be said to comply with the LSP? If I have a class Foo
, like so:
class Foo:
def __init__(self, bar):
self.bar = bar
then, surely this class definition (and all others like it) violates the contract specified by object
? object
and Foo
cannot be used interchangeably, as object
accepts exactly 0 parameters to its constructor, while Foo
accepts exactly 1.
Question 2: collections.Counter
As Raymond Hettinger tells us, the Python dictionary is an excellent example of the open/closed principle. The dict
class is "open for extension, closed for modification" — if you wish to modify some of the prepackaged behaviours of a dict
, you're advised to inherit from collections.abc.MutableMapping
or collections.UserDict
instead.
collections.Counter
, however, is a direct subclass of dict
. While it is mostly an extension of dict
rather than a modification of behaviours already defined in dict
, this isn't true for collection.Counter.fromkeys
. With a standard dict
, this classmethod is an alternative constructor for a dictionary, but the method is overridden in Counter
so that it raises an exception. Here is the comment explaining why this is the case:
# There is no equivalent method for counters because the semantics
# would be ambiguous in cases such as Counter.fromkeys('aaabbc', v=2).
# Initializing counters to zero values isn't necessary because zero
# is already the default value for counter lookups. Initializing
# to one is easily accomplished with Counter(set(iterable)). For
# more exotic cases, create a dictionary first using a dictionary
# comprehension or dict.fromkeys().
The explanation makes sense — but surely this violates the LSP? According to the Wikipedia article, the LSP states that "New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype." dict.fromkeys
does not throw a NotImplementedError
; Counter.fromkeys
does.
Comments
I'm interested in whether these examples do, in fact, break the LSP. However, I'm also interested in why they break the LSP, if indeed they do. In what situations is enforcing the LSP necessary/advisable/useful? In what situations is worrying about the LSP more of a meaningless distraction? Etc.
Best Answer
I think it's good to go back to the actual definition of the Liskov substitution principle:
Note, the principle only refers to properties of objects. And, implicitly, only public properties, because the Liskov substitution principle is interested in behavioral typing — typing according to observable properties.
With that in mind…
Answer 1
There are two parts to this. First,
__init__
is not a constuctor.__new__
is the constructor — the method that actually constructs a new class instance from whole cloth.__init__
is just a private method that is called by__new__
on the new instance. And since it's private, it's not part of your type and not subject to the Liskov substitution principle.What about
__new__
then? All the parameters of__init__
are by default implicitly parameters of__new__
, so am I just kicking the can down the road?__new__
is a static method, so it's not a property of an instance ofobject
— it's part ofobject
itself. So it's not subject to the Liskov substitution principle for instance ofobject
either.(Answer 1 Digression)
Here's where it gets kind of interesting (to me, at least).
object
is a class, but in Python classes are objects. They're instances oftype
, which is called their metaclass. So while__new__
is a static method, that means it's an instance method ofobject
itself.1 So, it is subject to the Liskov substitution principle for instance oftype
. And if we look at the definition of__new__
intype
, we see:So
type
's__new__
accepts any and all arguments. Since many classes__init__
methods — and thus their__new__
methods — don't accept arbitrary arguments, those class objects are kind of in violation of the Liskov substitution method as instances oftype
. But… as you pointed out later in your question,And that's exactly what
__new__
does. If you calltype.__new__
with different arguments than it expects, it throws aTypeError
:Which means that all subtypes of
type
(i.e., all class objects) are free to throw their ownTypeError
s in__new__
, and callers are obligated to handle it. And that's exactly whatobject.__new__
does, but under different conditions:So, by weakening the base type's precondition (the argument list) and opting to instead validate “at runtime” by throwing an exception, the
__new__
method is able to meet the Liskov substitution principle as a property of instances oftype
.2This means we really can't just instantiate any arbitrary type in Python (either by calling
__new__
or by just calling the type) without knowing the target type, unless we're prepared to catch and handleTypeError
s — and I think that tracks with most programmers' intuitions.Answer 2
The answer here is similar to the previous answer. Since
fromkeys
is a class method, it's not really a property of instances ofdict
and not subject to the Liskov substitution principle.3But then, what about if we look at
dict
as a class object? Do we run into the same complications we did withobject.__new__
? No, we don't, becauseCounter
anddict
don't have any sort of hierarchical relationship as class objects — they're both direct instances oftype
. We can't assume anything about theirfromkeys
methods because they didn't inherit them fromtype
.On the other hand, in Python a class does inherit all its parents' properties, which includes static and class methods like
fromkeys
. SoCounter
has to do something withfromkeys
. It could attempt to hide the method, e.g., by replacing it with a descriptor that always throws anAttributeError
, or even just by setting the property toNone
. The author ofCounter
chose to keep the method visible and to throwNotImplementedError
instead, perhaps to signal that the method is intentionally unusable.4In the end, the Liskov substitution principle is just an attempt to formalize something very intuitive: don't surprise the users of your code. In that sense, it may be seen a necessary condition for “good code” (whatever that is), but not a sufficient condition.
1 This is a slight lie.
__new__
is not an instance method ofobject
because it doesn't take the receiver (cls
, a.k.a.self
) as a parameter. It's a static method, so it's essentially just a function property ofobject
— but it doesn't make a difference to the discussion.2 I put “at runtime” in scare quotes because technically everything in Python is at runtime, but hopefully the distinction is clear.
3 It is possible to call static and class methods through instances, e.g.,
This just dispatches the method to
type(d1)
, which isdict
. As far as I know, it's pretty well accepted that this is a quirk of Python, and we still think of those methods as properties of the type and not properties of the instance. But I suppose, in the strictest sense, that is a violation of the Liskov substitution principle.4 According to the docs for
NotImplementedError
, this is exactly how that exception is not meant to be used.But, I suppose the standard library is allowed to contradict itself.