Best Practices for Data Retrieval in Domain-Rich Applications

Architecturedata-warehousedomain-driven-designmvcreporting

First, I want to say this seems to be a neglected question/area, so if this question needs improvement, help me make this a great question that can benefit others! I'm looking for advice and help from people who have implemented solutions that solve this problem, not just ideas to try.

In my experience, there are two sides of an application – the "task" side, which is largely domain driven and is where the users interact richly with the domain model (the "engine" of the application) and the reporting side, where users get data based on what happens on the task side.

On the task side, it's clear that an application with a rich domain model should have business logic in the domain model and the database should be used mostly for persistence. Separation of concerns, every book is written about it, we know what to do, awesome.

What about the reporting side? Are data warehouses acceptable, or are they bad design because they incorporate business logic in the database and the very data itself? In order to aggregate the data from the database into data warehouse data, you must have applied business logic and rules to the data, and that logic and rules didn't come from your domain model, it came from your data aggregating processes. Is that wrong?

I work on large financial and project management applications where the business logic is extensive. When reporting on this data, I will often have a LOT of aggregations to do to pull the information required for the report/dashboard, and the aggregations have a lot of business logic in them. For performance sake, I have been doing it with highly aggregated tables and stored procedures.

As an example, let's say a report/dashboard is needed to show a list of active projects (imagine 10,000 projects). Each project will need a set of metrics shown with it, for example:

  1. total budget
  2. effort to date
  3. burn rate
  4. budget exhaustion date at current burn rate
  5. etc.

Each of these involves a lot of business logic. And I'm not just talking about multiplying numbers or some simple logic. I'm talking about in order to get the budget, you have to apply a rate sheet with 500 different rates, one for each employee's time (on some projects, other's have a multiplier), applying expenses and any appropriate markup, etc. The logic is extensive. It took a lot of aggregating and query tuning to get this data in a reasonable amount of time for the client.

Should this be run through the domain first? What about performance? Even with straight SQL queries, I'm barely getting this data fast enough for the client to display in a reasonable amount of time. I can't imagine trying to get this data to the client fast enough if I am rehydrating all these domain objects, and mixing and matching and aggregating their data in the application layer, or trying to aggregate the data in the application.

It seems in these cases that SQL is good at crunching data, and why not use it? But then you have business logic outside your domain model. Any change to the business logic will have to be changed in your domain model and your reporting aggregation schemes.

I'm really at a loss for how to design the reporting/dashboard part of any application with respect to domain driven design and good practices.

I added the MVC tag because MVC is the design flavor du jour and I am using it in my current design, but can't figure out how the reporting data fits into this type of application.

I'm looking for any help in this area – books, design patterns, key words to google, articles, anything. I can't find any information on this topic.

EDIT AND ANOTHER EXAMPLE

Another perfect example I ran across today. Customer wants a report for Customer Sales Team. They want what seems like a simple metric:

For each Sales person, what is their annual sales to date?

But that is complicated. Each sales person participated in multiple sales opportunities. Some they won, some they didn't. In each sales opportunity, there are multiple sales people who are each allocated a percentage of credit for the sale per their role and participation. So now imagine going through the domain for this…the amount of object rehydration you would have to do to pull this data from the database for every Sales person:

Get all the SalesPeople->
For each get their SalesOpportunities->
For each get their percentage of the sale and calculate their Sales amount
then Add up all their SalesOpportunity Sales amount.

And that's ONE metric. Or you can write a SQL query which can do it quickly and efficiently and tune it to be fast.

EDIT 2 – CQRS Pattern

I've read about the CQRS Pattern and, while intriguing, even Martin Fowler says it's not tested. So how has this problem BEEN solved in the past. This has to have been faced by everyone at some point or another. What is an established or well-worn approach with a track record of success?

Edit 3 – Reporting Systems/Tools

Another thing to consider in this context is reporting tools. Reporting Services/Crystal Reports, Analysis Services and Cognoscenti, etc. all expect data from SQL/database. I doubt your data will come through your business later for these. And yet they and others like them are a vital part of the reporting in a lot of large systems.
How is the data for these properly handled where there is even business logic in the data source for these systems as well as possibly in the reports themselves?

Best Answer

This is a very glib answer, but getting right to the heart of the matter:

In terms of DDD maybe think of reporting as a Bounded Context?, so rather than thinking in terms of "THE" domain model, you should be willing to think that it's okay to have more than one model. So yes it's okay if the reporting domain has reporting business logic in it, just as it's okay for the transactional domain to have transactional business logic in it.

As to the question of, say SQL stored procedures vs. domain model in application code, the same pros and cons apply for the reporting system as for the transactional system.

Since I see that you added a bounty to the question, I read the question again and noticed that you are asking for specific resource on this, so I thought I'd start with suggesting that you look at other Stack Overflow questions on the matter, and I found this one https://stackoverflow.com/questions/11554231/how-does-domain-driven-design-handle-reporting

The general gist of that one is to use CQRS as a pattern for your system, which is consistent with DDD, and rely on the query side responsibilities as a way to get reporting done, but I'm not sure that is a helpful answer in your case.

I also found this http://www.martinfowler.com/bliki/ReportingDatabase.html, which I found linked from here: http://groups.yahoo.com/neo/groups/domaindrivendesign/conversations/topics/2261

Here's an interesting article from ACM on the matter: http://dl.acm.org/citation.cfm?id=2064685 but it's behind a paywall so I can't actually read it (not an ACM member :( ).

There's also this here answer on a similar question: https://stackoverflow.com/questions/3380431/cqrs-ddd-synching-reporting-database

and this one: http://snape.me/2013/05/03/applying-domain-driven-design-to-data-warehouses/

Hope this helps!

Related Topic