TDD – TDD with SQL and Data Manipulation Functions

databasesqltddtest-automation

While I'm a professional programmer, I've never been formally trained in software engineering. As I'm frequently visiting here and SO, I've noticed a trend for writing unit tests whenever possible and, as my software gets more complex and sophisticated, I see automated testing as a good idea in aiding debugging.

However, most of my work involves writing complex SQL and then processing the output in some way. How would you write a test to ensure your SQL was returning the correct data, for example? Then, say if the data wasn't under your control (e.g., that of a 3rd party system), how can you efficiently test your processing routines without having to hand write reams of dummy data?

The best solution I can think of is making views of the data that, together, cover most cases. I can then join those views with my SQL to see if it's returning the correct records and manually process the views to see if my functions, etc. are doing what they're supposed to. Still, it seems excessive and flakey; particularly finding data to test against…

Best Answer

An important rule to test everything that is database related is to completely isolate it from the rest of your application.

The ports and adapters architecture is a really good example. The database is regarded as an external plugin through an adapter to your application. The same goes with all the 3rd party subsystems. For testing how your app would behave and interpret the responses of 3rd party subsystems the only way I know how to test that is to stub the responses of this individual subsystem. I does not necessarily mean that you would have to manually write all the data objects. You can easily take the approach of using data driven testing.

In regard to testing how your application interacts with your database you can fake out the database adapters to use an in-memory database for example.

Now testing your database queries. First of all all the complex queries should be decomposed in more easy, simple and predictable queries. The same you would do for a fat class or for a fat function. There are tools that can help you testing your database like Dbunit. A simple approach I sometimes take is to use the concept of characterization tests. So I would put the database in a know state, run all the queries I have to write save the output in a place(file, memory) and consider this output to be the correct one. The next runs would compare their output with this one, so that would definitely offer me the regression testing i need. Indeed the first output is not guaranteed to be correct but the regression problem can be solved this way. If you have your queries well decomposed you can test them individually towards the database that is in a known state.

Related Topic