Using Sql Server Change Data Capture with a frequently changing schema

migrationsql server

We are looking into enabling Sql Server Change Data Capture for a new subsystem we are building.

It's not really because we need it, but we are being pushed for having a complete history traceability, and CDC would nicely solve this requirement with minimum effort on our parts.

We are following an agile development process, which in this case means that we frequently make changes to the database schema, e.g. adding new columns, moving data to other columns, etc.

We did a small test where we created a table, enabled CDC for that table, and then added a new column to the table. Changes to the new column is not registered in the CDC table.

Is there a mechanism to update the CDC table to the new schema, and are there any best practices to how you deal with captured data when migrating the database schema?

Best Answer

We have also recently started looking at CDC. I'm not an expert on the subject, but I think I have some answers for your questions.

For the most part, CDC will help you achieve your goal of a completely traceable history, but I don't think it will get you all of the way there.

First off:

we frequently make changes to the database schema ... Is there a mechanism to update the CDC table to the new schema

And this is where I think CDC will fail you. The MSDN documentation under the section "Understanding Change Tracking Overhead" is pretty clear that it won't track the schema changes for you. For example, with Alter Table Add Column:

If a new column is added to the change tracked table, the addition of the column is not tracked. Only the updates and changes that are made to the new column are tracked.

Drop Column is a little bit more complex.

However, you should be using DB scripts to alter your schema so you don't necessarily have to rely upon CDC here. That allows you to have consistency between your QA and Production schemas. And change to QA should be performed by script so the exact same changes can be applied to Prod. It shouldn't be too hard to extract out the schema changes from those scripts. This may mean that the "time" dimension of your history be driven by version instead of actual time, but the end result will be the same.

If you don't have one already, create a table to track the version of your database schema. And then place that database schema version table under CDC so you can align macroscopic changes to the schema against the microscopic changes within a particular table.

To my understanding, you should still be seeing the data added to the new column(s) regardless of CDC not showing the schema change. And the data migration from table to table should also be picked up by CDC.

are there any best practices to how you deal with captured data when migrating the database schema?

Treat it like you would treat an audit. You need to understand what it is you're examining, why you're examining it, and how long you need to keep that information around. Scope and retention are the two biggest bugaboos when it comes to a task like this.

CDC's reporting tools are understandably austere, so you have to know the context of the changes. It's too easy to say "track everything!" and end up with nothing that's usable as a result. Likewise, you could be doubling the size of your database by keeping a copy of every change. On a high churn table with many inserts and deletes, you'll end up with astronomical growth. That's not bad in and of itself, but you need to budget for that growth and have a means to examine all of the data that's generated.

So this gets you back to understanding why you are being pushed to have complete traceability. There are certainly valid reasons for that requirement. But you won't be able to structure your effective auditing of the database until you know why you must meet that requirement.

Related Topic