Predicting advantages of database denormalization

database-designrdbmsrelational-database

I was always taught to strive for the highest Normal Form of database normalization, and we were taught Bernstein's Synthesis algorithm to achieve 3NF. This is all very well and it feels nice to normalize your database, knowing that fields can be modified while retaining consistency.

However, performance may suffer. That's why I am wondering whether there is any way to predict the speedup/slowdown when denormalizing. That way, you can build your list of FD's featuring 3NF and then denormalize as little as possible. I imagine that denormalizing too much would waste space and time, because e.g. giant blobs are duplicated or it because harder to maintain consistency because you have to update multiple fields using a transaction.

Summary: Given a 3NF FD set, and a set of queries, how do I predict the speedup/slowdown of denormalization? Link to papers appreciated too.

Best Answer

You would have to know the dataflows between the tables to be able to see how the DB model performs. Once you have that you can calculate the change in performance for a given denormalization (e.g if you decide to duplicate data)

Some rough estimates can be deduced by how many new indexes you would need after the denormalization steps. Each new index must be updated and queried separately which will incur a performance hit proprtional to the number of new indexes.

Big blobs of binary data should in any case be stored in a separate table and not copied around. They are (usually) not queried but returned as part of the final result set after a query against some other set of tables.

Related Topic