Duck Typing, Data Validation, and Assertive Programming in Python

assertionsduck-typingpythonvalidation

About duck typing:

Duck typing is aided by habitually not testing for the type of arguments in method and function bodies, relying on documentation, clear code and testing to ensure correct use.

About argument validation (EAFP: Easier to ask for forgiveness than permission). An adapted example from here:

…it is considered more pythonic to do :

def my_method(self, key):
    try:
        value = self.a_dict[member]
    except TypeError:
        # do something else

This means that anyone else using your code doesn't have to use a real dictionary or subclass – they can use any object that implements the mapping interface.

Unfortunately in practise it's not that simple. What if member in the above example might be an integer ? Integers are immutable – so it's perfectly reasonable to use them as dictionary keys. However they are also used to index sequence type objects. If member happens to be an integer then example two could let through lists and strings as well as dictionaries.

About assertive programming:

Assertions are a systematic way to check that the internal state of a program is as the programmer expected, with the goal of catching bugs. In particular, they're good for catching false assumptions that were made while writing the code, or abuse of an interface by another programmer. In addition, they can act as in-line documentation to some extent, by making the programmer's assumptions obvious. ("Explicit is better than implicit.")

The mentioned concepts are sometimes in conflict, so i count on the following factors when choosing if i don't do any data validation at all, do strong validation or use asserts:

  1. Strong validation. By strong validation I mean raising a custom Exception (ApiError for example). If my function/method is part of a public API, it's better to validate the argument to show a good error message about unexpected type. By checking the type I do not mean using only isinstance, but also if the object passed supports the needed interface (duck typing). While I document the API and specify the expected type and the user might want to use my function in an unexpected way, I feel safer when I check the assumptions. I usually use isinstance and if later I want to support other types or ducks, I change the validation logic.

  2. Assertive programming. If my code is new, i use asserts a lot. What are your advices on this? Do you later remove asserts from the code?

  3. If my function/method is not part of an API, but passes some of its arguments through to a other code not written, studied or tested by me, I do a lot of asserts according to the called interface. My logic behind this – better fail in my code, then somewhere 10 levels deeper in stacktrace with incomprehensible error which forces be to debug a lot and later add the assert to my code anyway.

Comments and advices on when to use or not to use type/value validation, asserts? Sorry for not the best formulation of the question.

For example consider the following function, where Customer is a SQLAlchemy declarative model:

def add_customer(self, customer):
    """Save new customer into the database.
    @param customer: Customer instance, whose id is None
    @return: merged into global session customer
    """
    # no validation here at all
    # let's hope SQLAlchemy session will break if `customer` is not a model instance
    customer = self.session.add(customer)
    self.session.commit()
    return customer

So, there several ways to handle validation:

def add_customer(self, customer):
    # this is an API method, so let's validate the input
    if not isinstance(customer, Customer):
        raise ApiError('Invalid type')
    if customer.id is not None:
        raise ApiError('id should be None')

    customer = self.session.add(customer)
    self.session.commit()
    return customer

or

def add_customer(self, customer):
    # this is an internal method, but i want to be sure
    # that it's a customer model instance
    assert isinstance(customer, Customer), 'Achtung!'
    assert customer.id is None

    customer = self.session.add(customer)
    self.session.commit()
    return customer

When and why would you use each of these in the context of duck typing, type checking, data validation?

Best Answer

Let me give some guiding principles.

Principle #1. As outlined in http://docs.python.org/2/reference/simple_stmts.html the performance overhead of asserts can be removed with a command line option, while still being there for debugging. If performance is a problem, do that. Leave the asserts. (But don't do anything important in the asserts!)

Principle #2. If you're asserting something, and will have a fatal error, then use an assert. There is absolutely no value in doing something else. If someone later wants to change that, they can change your code or avoid that method call.

Principle #3. Do not disallow something just because you think it is a stupid thing to do. So what if your method allows strings through? If it works, it works.

Principle #4. Disallow things that are signs of likely mistakes. For instance consider being passed a dictionary of options. If that dictionary contains things that are not valid options, then that's a sign that someone didn't understand your API, or else had a typo. Blowing up on that is more likely to catch a typo than it is to stop someone from doing something reasonable.

Based on the first 2 principles, your second version can be thrown away. Which of the other two you prefer is a matter of taste. Which do you think more likely? That someone will pass a non-customer to add_customer and things will break (in which case version 3 is preferred), or that someone will at some point want to replace your customer with a proxy object of some kind that responds to all of the right methods (in which case version 1 is preferred).

Personally I've seen both failure modes. I'd tend to go with version 1 out of the general principle that I'm lazy and it is less typing. (Also that kind of failure usually tends to show up sooner or later in a fairly obvious way. And when I want to use a proxy object, I get really annoyed at people who have tied my hands.) But there are programmers I respect who would go the other way.

Related Topic