Database – Using a single table for identity and metadata

databasedatabase-designschemasql

I'm in the early design phase of a project to provide an e-commerce platform that will require several entities to be modelled, products, customers, orders, CMS pages, etc. They will all have a few things in common (ID, creation timestamp, last modified timestamp, etc).

My first thought was the usual one of giving the various tables an ID column that will use the database's mechanism for assigning uniqueness (autoincrement in MySQL, sequences in Postgres, etc) but given they have a few things in common I was considering a design where all that data is kept in a base BusinessObject table and the tables for the other entities use a primary foreign key that references the BusinessObject table.

For example (in pseudocode)

CREATE TABLE BusinessObject (
    id,
    date_created, 
    date_updated, 
    is_deleted,
    // etc
    PRIMARY KEY id AUTOINCREMENT
);

CREATE TABLE Customer (
    id,
    forename,
    surname,
    // etc
    PRIMARY KEY id
    FOREIGN KEY id REFERENCES BusinessObject.id
);

CREATE TABLE Product (
    id,
    name,
    price,
    description,
    // etc
    PRIMARY KEY id
    FOREIGN KEY id REFERENCES BusinessObject.id
);

and so on.

I can think of a number of advantages to this approach. First, a particular ID always only maps onto one particular object. For example, the id 3 in a system where each table generates its own IDs could refer to a customer, an order or anything else, whereas in the above design, ID 3 will always be an order, because there could never be a customer or product with ID 3. This would make stuff like extrapolating the referenced business object from the URL a lot easier, allowing for simpler routing in the application layer.

However, it also means that every table in the system must join against the BusinessObject table, and I'm worried that this would result in some significant drawbacks. For example the fact that one particular table is going to be involved in nearly all queries may result in degraded performance for that table, or that it might be possible for a row in Customer to reference the same row in BusinessObject as a row in Product, resulting in loss of data integrity unless some additional steps are taken to prevent that.

So basically, what are the pros and cons of a design where a single table provides the identity data for most of the rest of the database? Are such designs fairly common or is it better to just have each table have its own identity source and rely on cleverer application logic to determine the object being referenced?

Best Answer

You are designing your system in the incorrect order. You need to develop your business objects first. I know you are developing your DB first because you are trying to impose patterns on it that belong in your business objects.

You are also making the common mistake of thinking that there can only be one ID because you are thinking in terms of primary keys.

Also, it is not clear to me, that those things are as related as you think they are.

Assuming they are a related pattern, you should define interfaces to express that, IHasBusinessEntityId, ICreatedDate, IModifiedDate. Please do not forget the interface segregation principal. Then perhaps you can aggregate those interfaces into an IAuditable interface?

Explore the patterns in your business objects first, then you can concentrate on making your table structure fast and efficient.

Each of your tables can have the BusinessEntityId (perhaps a guid, maybe ints created by a SEQUENCE), then additionally, the individual tables sequential int identifier to use as the clustered index (guids are not great for clustered indexes). You can then use a SQL UNION to bring back all your IAuditable objects from disparate tables.

I would very much recommend against marrying all these to one table as it will cost you flexibility and performance problems.

This approach will result in clustered index fragmentation, which can drastically slow queries.

Related Topic