Database Design – How to Argue Against Duplicating Database Columns

database-design

I've started working at a new organization and one of the patterns I've been seeing in the database is duplicating fields to make writing queries easier for the business analysts. We're using Django and its ORM.

In one case, we keep a MedicalRecordNumber object with a unique string identifying a patient in a certain context. We have Registration objects which track patients and have associated MedicalRecordNumbers, but rather than using a foreign key relationship, they duplicate the string so they can avoid writing a join (not for performance reasons). This pattern is common throughout the database.

For me the importance of a data model being clean is just so I can think about it well. Needless complexity is a waste of my limited cognitive processing time. It's a systematic problem. Not being comfortable writing joins is a rectifiable skills issue. I don't necessarily want to advocate going back and changing the schema, but I'd love to be able to convincingly articulate the problems with this type of duplication.

Best Answer

Your operational database should be highly normalized, to reduce anomalies.

Your analytic database (warehouse) should be highly denormalized, to ease analysis.

If you don't have a separate analytic database, you should make some highly denormalized [materialized] views.

If you tell your senior business analysts / managers to do lots of joins for a simple analysis, well, you might get fired.

Agile Data Warehouse Design is a good book

See my quick n' dirty data warehouse tips here

Related Topic