Database Design – Favoring Immutability in Database Design

database-designimmutability

One of the items in Joshua Bloch's Effective Java is the notion that classes should allow mutation of instances as little as possible, and preferably not at all.

Oftentimes, the data of an object is persisted to a database of some form. This has led me to thinking about the idea of immutability within a database, especially for those tables that represent a single entity within a larger system.

Something I have been experimenting with recently is the idea of trying to minimize the updates I do to table rows representing these objects, and trying to perform inserts instead as much as I can.

A concrete example of something I was experimenting with recently. If I know I might append a record with additional data later on, I'll create another table to represent that, sort of like the following two table definitions:

create table myObj (id integer, ...other_data... not null);
create table myObjSuppliment (id integer, myObjId integer, ...more_data... not null);

It is hopefully obvious that these names are not verbatim, but just to demonstrate the idea.

Is this a reasonable approach to data persistence modeling? Is it worth trying to limit updates performed on a table, especially for filling in nulls for data that might not exist when the record is originally created? Are there times when an approach like this might cause severe pain later on?

Best Answer

The primary purpose of immutability is to ensure that there's no instant in time when the data in memory is in an invalid state. (The other is because mathematical notations are mostly static, and so immutable things are easier to conceptualize and model mathematically.) In memory, if another thread tries to read or write data while it's being worked with, it might end up going corrupt, or it might itself be in a corrupt state. If you have multiple assignment operations to an object's fields, in a multithreaded application, another thread might try to work with it sometime in between -- which could be bad.

Immutability remedies this by first writing all the changes to a new place in memory, and then doing the final assignment as one fell-swoop step of rewriting the pointer to the object to point to the new object -- which on all CPUs is an atomic operation.

Databases do the same thing using atomic transactions: when you start a transaction, it writes all the new updates to a new place on disk. When you finish the transaction, it changes the pointer on disk to where the new updates are -- which it does in a short instant during which other processes can't touch it.

This is also the exact same thing as your idea of creating new tables, except more automatic and more flexible.

So to answer your question, yes, immutability is good in databases, but no, you don't need to make separate tables just for that purpose; you can just use whatever atomic transaction commands are available for your database system.

Related Topic