Database Design – Favoring Immutability in Database Design

database-designimmutability

One of the items in Joshua Bloch's Effective Java is the notion that classes should allow mutation of instances as little as possible, and preferably not at all.

Oftentimes, the data of an object is persisted to a database of some form. This has led me to thinking about the idea of immutability within a database, especially for those tables that represent a single entity within a larger system.

Something I have been experimenting with recently is the idea of trying to minimize the updates I do to table rows representing these objects, and trying to perform inserts instead as much as I can.

A concrete example of something I was experimenting with recently. If I know I might append a record with additional data later on, I'll create another table to represent that, sort of like the following two table definitions:

create table myObj (id integer, ...other_data... not null);
create table myObjSuppliment (id integer, myObjId integer, ...more_data... not null);

It is hopefully obvious that these names are not verbatim, but just to demonstrate the idea.

Is this a reasonable approach to data persistence modeling? Is it worth trying to limit updates performed on a table, especially for filling in nulls for data that might not exist when the record is originally created? Are there times when an approach like this might cause severe pain later on?

Best Answer

The primary purpose of immutability is to ensure that there's no instant in time when the data in memory is in an invalid state. (The other is because mathematical notations are mostly static, and so immutable things are easier to conceptualize and model mathematically.) In memory, if another thread tries to read or write data while it's being worked with, it might end up going corrupt, or it might itself be in a corrupt state. If you have multiple assignment operations to an object's fields, in a multithreaded application, another thread might try to work with it sometime in between -- which could be bad.

Immutability remedies this by first writing all the changes to a new place in memory, and then doing the final assignment as one fell-swoop step of rewriting the pointer to the object to point to the new object -- which on all CPUs is an atomic operation.

Databases do the same thing using atomic transactions: when you start a transaction, it writes all the new updates to a new place on disk. When you finish the transaction, it changes the pointer on disk to where the new updates are -- which it does in a short instant during which other processes can't touch it.

This is also the exact same thing as your idea of creating new tables, except more automatic and more flexible.

So to answer your question, yes, immutability is good in databases, but no, you don't need to make separate tables just for that purpose; you can just use whatever atomic transaction commands are available for your database system.

Related Solutions

Database – How would you design a user database with custom fields

Please consider this as an alternative. The previous two examples will both require that you make changes to the schema as the application's scope grows in addition the "custom_column" solution is difficult to extend and maintain. Eventually you'll end up with Custom_510 and then just imagine how awful this table will be to work with.

First let's use your Companies schema.

[Companies] ComnpanyId, COMPANY_NAME, CREATED_ON

Next we'll also use your Users schema for top level required attributes that will be used/shared by all companies.

[Users] UserId, COMPANY_ID, FIRST_NAME, LAST_NAME, EMAIL, CREATED_ON

Next we build a table where we will define our dynamic attributes that are specific to each companies custom user attributes. So here an example value of the Attribute column would be "LikeMusic":

[UserAttributeDefinition] UserAttributeDefinitionId, CompanyId, Attribute

Next we define a UserAttributes table that will hold user attribute values

[UserAttributes] UserAttributeDefinitionId, UserId, Value

This can be modified in many ways to be better for performance. You can use multiple tables for UserAttributes making each one specific to the data type being stored in Value or just leave it as a VarChar and work with it as a keyvalue store.

You also may want to move CompanyId off of the UserAttributeDefiniton table and into a cross reference table for future proofing.

Database Design – Is a ‘Record Status’ Column Bad Practice?

I know this as a "Soft Delete"; just marking a record as "deleted", even though it really isn't.

Is this a good practice, or a bad practice?

It depends.
If this is something that your users need [a lot] then it's probably a good thing. In the vast majority of cases, though, I would argue that it's adding [a lot of] overhead for little benefit.

Does it affect normalization of the data?

No, but it will affect your Indexing of that data.
Ensure that you include the "deleted" column in your indexes, so that these rows get excluded as early as possible in your queries.

What are the potential pitfalls?

Your data becomes a little more complex. Everything that goes anywhere near the data needs to "know" about these extra, "not-really-there" records. Or, you have to create Views on those tables that exclude these rows and use these views in, say, your Reporting Tool of Choice.

Your database may increase in size. If you're not really deleting these rows then they're still there, taking up space. This may or may not be an issue, especially since you've included them in your indexes, so the space they consume is multiplied up.

Is there any alternative method of achieving the same goal? (see note)

Not really, no.

How can you have the database enforce unique constraints on the data for a certain status only (but allow any number of duplicates for other statuses)?

Not easily. Declarative Referential Integrity (foreign key clauses) is the cleanest way to implement this and its easy for things like Reporting tools to pick up on these rules to determine the relationships between tables. Such rules apply to all records, regardless of "status" (and there's no way around that).

The alternative is to use Triggers, snippets of procedural code that enforce the referential integrity between tables and do all the clever, conditional stuff that you need. That's good for your particular case, but most of the benefits of Declarative R.I. go out of the window - there's no [externally] detectable relationships between your tables; that's all "hidden" in the triggers.

Why don't databases provide a "recycle bin"-like feature or table-tracking/recovery natively, so we can let interfaces delete the actual records without worry?

Why would they?

These are databases, after all, not file systems or spreadsheets.

What they do, they [can] do very, very well.

What they don't do, there probably hasn't been much demand for.

Best Answer

Related Solutions

Database – How would you design a user database with custom fields

Database Design – Is a ‘Record Status’ Column Bad Practice?

Related Topic