Python – Multiple Inheritance vs. Decorators for Composable Behaviors

mixinsmultiple-inheritancepython

I recently discovered (or rather realised how to use) Python's multiple inheritance, and am afraid I'm now using it in cases where it's not a good fit. I want to have some starting data source (NewsCacheDB,TwitterStream) that gets transformed in various ways (Vectorize,SelectKBest,SelectPercentile).

I found myself writing the following sort of code (Example 1) (the actual code is a bit more complex but the idea is the same). The point being that for ExperimentA and ExperimentB I can define exactly what self.data is, by just relying on class inheritance. Is this really a useful way of achieving the desired behaviour?

I could also use decorators (Example 2). Using the decorators would be less code.

Which approach is preferable? I'm not looking for arguments of the "I like writing decorators better" kind, but rather arguments about

readability
maintainability
testability
pythonicity (yes it's a word).

EXAMPLE 1

class NewsCacheDB(object):
    """Play back cached news articles from a database""" 
    def __init__(self):
        super(NewsArticleCache, self).__init__()

    @property
    def data(self):
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here

class TwitterCacheDB(object):
    """Play back cached tweets from a database""" 
    def __init__(self):
        super(TwitterCache, self).__init__()

    @property
    def data(self):
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here

class TwitterStream(object):
    def __init__(self):
        super(TwitterStream, self).__init__()

    @property
    def data(self):
        # setup access to live twitter stream
        while stream.isalive():
            yield stream.next()

class Vectorize(object):
    """Turn raw data into numpy vectors"""
    def __init__(self):
        super(Vectorize, self).__init__()

    @property
    def data(self):
        for item in super(Vectorize, self).data:
            transformed = vectorize(item) # slight simplification here
            yield transformed

class SelectKBest(object):
    """Select K best features based on some metric"""
    def __init__(self):
        super(SelectKBest, self).__init__()

    @property
    def data(self):
        for item in super(SelectKBest, self).data:
            transformed = select_kbest(item)  # slight simplification here
            yield transformed

class SelectPercentile(object):
    """Select the top X percentile features based on some metric"""
    def __init__(self):
        super(SelectPercentile, self).__init__()

    @property
    def data(self):
        for item in super(SelectPercentile, self).data:
            transformed = select_kbest(item)  # slight simplification here
            yield transformed

class ExperimentA(SelectKBest, Vectorize, TwitterCacheDB):
    # lots of control code goes here

class ExperimentB(SelectKBest, Vectorize, NewsCacheDB):
    # lots of control code goes here

class ExperimentC(SelectPercentile, Vectorize, NewsCacheDB):
    # lots of control code goes here

EXAMPLE 2

def multiply(fn):
    def wrapped(self):
        return fn(self) * 2
    return wrapped


def twitter_cacheDB(fn):
    def wrapped(self):
        user, pass = fn(self)
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here
    return wrapped

def twitter_live(fn):
    def wrapped(self):
        user, pass = fn(self)
        # setup access to data base
        while stream.isalive():
            yield stream.next() # slight simplification here
    return wrapped

def news_cacheDB(fn):
    def wrapped(self):
        user, pass = fn(self)
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here
    return wrapped

def vectorize(fn):
    def wrapped(self):
        for item in fn():
            transformed = do_vectorize(item)  # slight simplification here
            yield transformed
    yield wrapped

def select_kbest(fn):
    def wrapped(self):
        for item in fn():
            transformed = do_selection(item)  # slight simplification here
            yield transformed
    yield wrapped

class ExperimentA():
    @property
    @select_kbest
    @vectorize
    @twitter_cacheDB
    def a(self):
        return 'me','123' # return user and pass to connect to DB

class ExperimentB():
    @property
    @select_kbest
    @vectorize
    @news_cacheDB
    def a(self):
        return 'me','123' # return user and pass to connect to DB

Best Answer

Less code, as long as it's readable is better than more code

From a code size point of view I always go with the solution that requires the least amount of code that is still readable and maintainable. Less code means less chance for defects and less code to maintain.

Multiple Inheritance is not a good choice for Composition

From a design stand point I would not use multiple inheritance the way you describe for the following reasons:

attribute/method overloading

You are changing the way data is behaving in the different classes. While it doesn't directly violate the Open/Closed Principle of OO with the initial implementation, any changes in the future have a good chance of modifying the behaviors in one or more locations. You are also relying on behavior pulled through super which will only works correctly if you have the base classes ordered correctly in the class definition.

fragile tight (vertical) coupling

Relying on the class definition to specify the correct ordering of classes create a fragile system. It's fragile because you can't choose classes that have particular interfaces defined, you actually have to know the implemented logic so the super calls get executed in the correct order. It's also an extremely tight coupling as a result. Since it's using class inheritance we also get vertical coupling which basically means there are implicit dependencies not just in individual methods, but potentially between the different layers (classes).

multiple inheritance pitfalls

Multiple inheritance in any language often has many pitfalls. Python does some work to fix some issues with inheritance, however there are numerous ways of unintentionally confusing the method resolution order (mro) of classes. These pitfalls always exist, and they are also a prime reason to avoid using multiple inheritance.

Alternatives

Alternatively I would leave data source specific logic in the classes (ie. *_CacheDB). Then use either decorator or functional composition to add the generalized logic to automatically apply the transformations.

Related Solutions

Multiple Inheritance – Use Cases and Examples

Pros :

It sometimes allow more obvious modeling of a problem than other ways to model it.
If the different parrents have orthogonal purpose, it can allow some kind of compositing

Cons :

If the different parents don't have orthogonal purpose, it makes the type difficult to understand.
It's not easy to understand how it is implemented in a language (any language).

In C++ a good example of multiple inheritance used to composite orthogonal features is when you use CRTP to, for example, setup a component system for a game.

I've started to write an example but I think a real world example is more worth looking at. Some code of Ogre3D uses multiple inheritance in a nice and very intuitive way. For example, the Mesh class inherit from both Resources and AnimationContainer. Resources expose the interface common to all resources and AnimationContainer expose the interface specific for manipulating a set of animations. They are not related, so it's easy to think about a Mesh as being a resource that in addition can conain a set of animations. Feels natural isn't it?

You can look at other examples in this library, like the way memory allocation is managed in a fined grain way by making classes inherit from variants of a CRTP class overloading new and delete.

As said, the main problems with multiple inheritance rises from mixing related concepts. It makes the language have to set complex implementations (see the way C++ allows to play with the diamond problem...) and the user not being sure what's happening in that implementation. For example, read this article explaining how it is implemented in C++.

Removing it from the language helps avoiding people who don't know how the language is inforced to make things bad. But it forces to think in a way that, sometimes, don't feel natural, even if it's edge cases, it happen more often that you might think.

Python – Understanding Decorators in Python

Yes, it probably does, but it proprably retains a reference to the original get(self) definition.

Python decorators are nothing more than a callable themselves (a function, or a class instance with a __call__ method). Whatever that callable returns is used as the definition for the decorated function instead.

If I define a simple no-op decorator like this, that means that I replace the original with.... the original:

def noopDecorator(func):
    return func

The @ symbol used for decorators is syntactic sugar, you could also write it as:

class AuthHandler(BaseHandler, tornado.auth.GoogleMixin):
    def get(self):
        if self.get_argument("openid.mode", None):
            self.get_authenticated_user(self.async_callback(self._on_auth))
            return
        self.authenticate_redirect()
    get = tornado.web.asynchronous(get)

In the case of the tornado asynchronous decorator, the decorator probably returns a deferred handler, to handle the decorated function asynchronously, keeping you, the programmer of a tornado-based application, from having to remember the intricacies of how to do that, time and again. In short, it let's you focus on the details of your application instead.