Testing BI (Business Intelligence) systems on the MS stack

business-intelligencedata-warehousemicrosoft

I'm placing my first steps in the Business Intelligence sector and I'm looking for good ways to test BI systems. I'm going to try to explain what my system consists of so that people unfamiliar with BI systems could also pitch in some advice.

For those of you who are rather unfamiliar with BI systems, here's how a BI system typically looks like on the MS stack:

  1. Data is in MS SQL DBs, text files, XML files, etc
  2. The data is put into a single MS SQL DB (the
    data warehouse
    ) using (an) Sql
    Server Integration Services (SSIS)

    package(s). (ETL, Extract – Transform –
    Load)
  3. A 'Cube' containing
    Measurements, KPIs etc is processed
    on the data warehouse, using Sql
    Server Analysis Services (SSAS)

  4. Reporting can be built on the cube
    using excel, Reporting services,
    etc.

There are multiple things you can/should test:

  • Is the data put into the data warehouse correctly
  • Are the measurements and calculations done correctly

I think a test environment could be made in .NET using empty databases with the same structure as the production databases and data warehouse.

Tests could be written like this:

  1. Insert dummy data in the production databases of which you know that they should produce certain measurements, KPI values. For example: Insert a car with cost 100 and profit 110 sold in 2008. another car with cost 100 and profit 120 sold in 2009. markup in 2008 should be 10% and in 2009 20%.
  2. Run the SSIS package(s). I looked this up and its possible in .net
  3. Run the SSAS calculations. Still have to look up how to do it, but should be possible.
  4. Retrieve the KPI data from the Analysis Services database and test if it matches the expected results.

Some downsides I expect to this approach:

  1. Setting up test databases, security for the test script, … will take quite some time.
  2. Changes to production databases and the data warehouse will have to be copied to the test environment.
  3. Creating the insert statements for the test data could be difficult. In my car example it is simple but for more complex calculations this could be quite time consuming
  4. Tests will most likely run quite slow
  5. The tests are error prone, either in the INSERT statements or in the part where you retrieve the KPI data. Figuring out what went wrong could be difficult.

So summarized, writing and maintaining these tests could be quite consuming.

On the other hand, you would have a nice test-driven approach where you write the test first, plus your business logic is explained and documented in the tests.

Since I have not much practical experience with BI development I'm unsure if this is the way to go and what alternatives there are. Perhaps someone more experienced could explain how BI systems are usually tested and documented?

Best Answer

There's nothing wrong with your thinking or your approach, except one thing is missing.

Usually BI systems are used to generate information for decision making purposes from a huge ball of data (note the difference between information and data). This information is then further condensed to the point where management can connect relevant dots and make strategic or tactical decision. Usually these decisions involve a fair deal of money.

And now we get to the point: How much money will be lost if one of these decisions is wrong because the information provided was incorrect? That is a starting point for a conversation with the stakeholders about the amount of time (which can be expressed in money) they will want you to spend on automating the tests.

You should not underestimate the fact that BI generally works with loads of data and then uses statistical analysis. Your tests will have to simulate this.

Related Topic