Designing a data warehouse: action or status based

data-warehousedatabase-design

I am designing a data warehouse for a sales platform. There exists a group of users that offers and a group of users that demand entities.
The production data base holds two sources of information:

  1. The current status of each entity and to which users it is assigned (this can be versatile, "placed a price for an order", "bought", "offered"). This can thus involve multiple users depending on the status.
  2. An action log

Accordingly, there are two types of questions that the data warehouse can be asked. E.g.:

  1. How many offers have been without an order yesterday at 9 am? (Status)
  2. How many entities have been sold yesterday? (Action)

I aim for a simple star-schema where dimensions could be time, place, users, etc. But the question is: What would be the fact?

First approach

My first idea was to introduce a fact table for every status. Then every fact would need a starting and end validity and as soon as the action log says, that a status changed, I need to get back to the fact table where the current status of the entity is stored and add the end-validity time stamp and afterwards add the new fact in the table of the new status of the entity (with a missing end-validity time stamp).

With this approach, it is quite simple to answer question 1 (Status) but impossible to answer question 2 (Action). Also, It seems non-trivial to update the data warehouse on each action.

Second approach

An alternative idea was to simply store the action log in a star-schematized way. Every type of action becomes a fact. (Of course, the question 2 (Action) is most simple to answer but what about question 1 (Status)?

My Idea was to create date-parameterized views for each status:
For Example, to get all offers without an order yesterday at 9 am, I would have to get all actions that say "offered something" from before yesterday at 9am and I would have to subtract all offers that

  1. an order has been placed on
  2. that have been canceled until yesterday 9 am

I assume that the time-efficiency with this solution could be nonoptimal for reading. Although I thought, that data warehouses aim for trading write-efficiency for read-efficiency.

So my question again: How to design the fact-table to be able to answer both above questions with the minimum of complexity?

Best Answer

What is being described by the OP is a "sales funnel" scenario, and you want to track movement of (something you have not clearly explained) through this funnel. That "something" might be "a real estate sale" or "hiring a job candidate", for example. The point is, the "something" is a process, and this process is what you are describing in your question.

You don't talk about what the "entity" is in your scenario - and it's not clear if the entity is the:

  • "prospect/customer" (the buyer(s) or job candidate(s)) - the people/companies being targeted by the sales team
  • "product/service" (the real estate or the job role(s)) - what the "sales team" is selling to the "prospect"
  • "sales team" (the realtor(s) or the interviewer(s)) - company's people who would be working on moving the "prospect" through the "funnel"

Regardless, each of these are dimensions to the sales funnel process.

You are thinking of it wrong in terms of "status". What you really are tracking is movement of a "prospect" through stages/steps a "sales funnel". The sales funnel is a process of steps (your "statuses") and a "prospect" moves through 1 or more of those steps.

You CAN model each step in the funnel as it's own FACT and you can have a FUNNEL_TRANSACTIONS_FCT potentially to link them, but that doesn't meet your requirement of a dynamic set of steps (statuses, as you call them).

A prospect moves to a different funnel step when some "action" (transaction) occurs e.g. quote requested, quote provided, quote accepted, order placed, contract is signed, order is canceled, etc. So, let each "status" (as you call it) be a transaction type (which would be a dimension in and of itself - to which new "statuses" can be added, and changes can be made, etc).

It is clear from here that your FACT table rows represent these "actions/transactions", the result of which lands the prospect in some step of the funnel at some certain time.

Related Topic