R – How to prevent orphaned records in detail tables of normalized database

database-designnormalizing

I have to maintain an old database which is not properly normalized. For instance there is a project table that has grown (or maybe mushroomed) to have 5 or more different date columns, for different milestones of the project from being ordered to the delivery date. There are also several tables each with columns for street addresses, mail addresses or web links.

I would like to normalize the structure, create tables for addresses, scheduled dates and the like, and the necessary tables to allow for 1:N relations (address per customer, due date per project and so on).

Right now I'm completely unsure how to handle changes to the data in the detail tables. Consider for example the change of a customer delivery address. Changing the data in the address table is out of the question, because more than one record (possibly in more than one table) could reference that record. Adding a new address record could leave the old record orphaned if no other row has a foreign key relation to it.

I have thought about the following ways to handle this:

  • Add a new detail record, and check in an update trigger of the master table whether the old detail record has to be deleted. This would require knowledge about all tables that have relations to the detail table, in all of them or in a sproc. I don't like this loss of separation. It would also involve more tables in the active transaction.

  • Let the trigger try to delete the old detail record, and catch any errors. This just feels wrong.

  • Live with the orphaned record, and have a periodic maintenance task clean up all detail tables.

What is the preferred way to handle data changes in detail tables that are linked to several master tables? Any tips for reading up on this?

Best Answer

Part of the problem may be the original schema design: the foreign keys point the wrong way, treating addresses, phone numbers, etc. as master instead of detail. This may be convenient when you want all uses of a given address to update at once, but in my experience it always devolves into too many difficult exceptional cases, for example one person at a location moves so you need to break their link vs an entire household or office moving so you update the existing record. If you try to hide this detail from the user on the CRUD screen, you'll end up with a situation where it just doesn't do what you want.

If it's done that way just to collapse duplicate values, it's effectively a denormalization of the database: the mere existence of the address row is meaningless. The only difference is that unlike most denormalizations, it attempts to gain space efficiency instead of speed. Creating a link table at that point is simply compounding the problem.

If you want, for example, multiple addresses per contact, make the addresses a detail table with a foreign key pointing back to the parent contact, and don't worry about duplicated address values because they're just values. Otherwise, make Address a real entity: add a title or description field and a CRUD screen so it can stand on its own as an entity.