Persistent Storage – Strategy for Backwards Compatibility of Persistent Storage

backward compatibilitylanguage-agnosticpersistence

In my experience, trying to ensure that new versions of an application retain compatibility with data storage from previous versions can often be a painful process.

What I currently do is to save a version number for each 'unit' of data (be it a file, database row/table, or whatever) and ensure that the version number gets updated each time the data changes in some way. I also create methods to convert from v1 to v2, v2 to v3, and so on. That way, if I'm at v7 and I encounter a v3 file, I can do v3->v4->v5->v6->v7.

So far this approach seems to be working out well, but I haven't had to make use of it extensively yet so there may be unforseen problems. I'm also concerned that if the objects I'm loading change significantly, I'll either have to keep around old versions of the classes or face updating all my conversion methods to handle the new class definition.

Is my approach sound? Are there other/better approaches I could be using? Are there any design patterns applicable to this problem?

Best Answer

You're doing it right. You're stamping data with its version, which means you have a definite interpretation of it. The only open question is how to handle "old" data. Your choices are essentially between upgrading data where it lives, having your code adapt the data in realtime, or having the code handle multiple data versions. From 30+ years experience, I can tell you the former is the only sane way to go. Bite the bullet and write a conversion routine for each step along the history, and run them in sequence. If you find that a later step obviates an earlier one (e.g., why upgrade the rows in a table if a later step deletes the table?), resist the temptation to short-circuit things unless there is a large, demonstrable performance gain in the update process.

Related Solutions

Programming Languages – Backward Compatibility vs Fixing Flaws

It sounds fine, but rarely works out in practice; people are extremely reluctant to change running code, and even for new, green-field projects they are very reluctant to switch way from a language/version that they already know.

Changing existing, running code that "works fine" is not something that ranks high on any project's priority list. Rather than applying effort to things that the managers thought had been paid for already, just to be able to upgrade to a newer release of a language or platform, they will decree that the developers should just stay on the old release "for now". You can try to entice your users with great features only available in the new release, but it's a gamble where you risk decreasing your user base for no clear gain for the language; cool, modern features cannot easily be weighed against the price of fragmented installation base in popular opinion, and you run the risk of getting a reputation for being an "upgrade treadmill" that requires constant effort to keep running when compared to more relaxed languages/platforms.

(Obviously, most of this doesn't apply to projects written by hobbyists just for their own pleasure. However (here be flamebait...) PHP is disproportionally rarely chosen by hackers because it's such a pleasure to write with in the first place.)

Python – Changing method signature while keeping backwards compatibility

In Python, it's "Easier to ask for forgiveness than permission" - it is common "Pythonic" practice to use exceptions and error handling, rather than e.g. if checking up-front ("Look before you leap") to handle potential problems. The documentation provides a few examples that demonstrate where the latter can really cause problems - if the situation changes between the look and the leap, you have serious trouble!

On that basis, and given that a function will raise a TypeError if provided with the wrong number of arguments, you could use:

try:
    # Have a go with the new interface
    self._callback(data, additional_arg)
except TypeError:
    # Fall back to the old one
    self._callback(data)

You could use a decorator function to wrap any callback:

def api_compatible(func):
    @functools.wraps(func)
    def wrapper(data, *args, **kwargs):
        try:
            return func(data, *args, **kwargs)
        except TypeError:
            return func(data)
    return wrapper

Now it becomes:

self._callback = api_compatible(callback)
...
self._callback(data, additional_arg)

Related Topic

Design – a good design for allowing backwards compatibility of files between different versions of software