Multidimensionality is an essential feature of data warehousing.
It is not a "workaround" for limitations of the relational model. It is the best way to model data where you need to do arbitrary "slice and dice" analysis of facts with respect to multiple non-trivial dimensions.
Star-schema queries are not very complex. They're actually very simple, since they're almost always of the form SELECT SUM(MEASURE) FROM FACT JOIN DIM1 ON ... JOIN DIM2 ON ... WHERE...
.
Join operations are -- generally -- slow. However, the joins can be done in object-oriented code instead of a SQL warehouse.
In many cases, most dimensions are actually rather small and fit entirely in memory. The analysis queries can devolve to simple fetches of all the facts followed by in-memory lookups of dimensional attributes.
In the remaining cases, we have a snowflake schema where a dimension (usually a customer, patient or member dimension) is almost as large as the relevant fact table. In this case, a relational join in the database is helpful.
"The object databases don't have joins because relations between objects are maintained by direct references."
Isn't completely true. Object databases have navigation from object to object. If you retrieve a set of objects along with their related objects, a join operation will -- in effect -- have been performed.
"The question is do we need the same concepts (multidimensional database - data warehouse, etc) when we talk about business intelligence for object database?"
Yes. Multidimensionality is essential. Absolutely. An object database will represent this just as well (or perhaps better) than a relational database. Either model, however, must represent the essential truth of Measures and their Dimensions.
I'll see if I can do my best to answer your questions succinctly.
1.At what point is building a data warehouse an option worth considering?
In other words, what telltale signs,
metrics, or other criteria should I be
looking out for that might indicate
that a standard transactional
environment is no longer sufficient?
a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.
b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.
c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.
d. If you want to bring together data from multiple sources.
2.What are the alternatives to a full-on data warehouse?
Denormalization in the transactional
database and the bog-standard
replicated "report server" are two
that come to mind; are there any
others I should explore before
committing to the DW?
3.Why is a data warehouse better than said alternatives? If the answer is,
"it depends", then what does it depend
on?
I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."
Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.
4.When shouldn't I attempt to build a data warehouse? I'm skeptical of
anything declared as a "best practice"
irrespective of context. Surely there
must be some scenarios where a DW is
the wrong choice - what are they?
Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.
If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.
5.Are there any practical examples I could look at of systems that were
improved by introducing a data
warehouse? Something that would
explain to me, end-to-end, what sorts
of decisions or analysis they needed
the warehouse for, how they decided
what to put in it, and how the
warehouse ended up fitting into the
larger environment? I don't want a
contrived "let's make a cube out of
the AdventureWorks database" - the
implementation is irrelevant to me,
I'm interested in the specifications
and designs and overall thought
process that were involved.
That's a big question that would take far more space than I'm allotted here.
On this one, I can point you to a few places that might provide the insight you seek.
- "Implementing A Data Warehouse: A Methodology that worked" by Bruce Ullrey is a book documenting one man's journey to building a data warehouse. It's not highly polished, which gives it more realism. It reads like a journal with lots of models and other visuals that illustrate his efforts pretty well.
- "Business Intelligence Roadmap" by Larissa Moss. Standard fare. Walks you through the process of building a BI practice at a high level.
- "The Profit Impact of Business Intelligence" by Steve Williams gives a number of case studies that show the value of building data warehouses.
Best Answer
Data Warehouses have very little in common with NoSQL - the main similarity is that any two data warehouses can have very different philosopohies or conventions just like any two NoSQL systems can be nearly unrelated.
The only concept they share is that they are both used to analyze large amounts of data.
NoSQL solutions usually manage relatively limited schemas with large cardinality in few entities, while data warehouses typically have lots of facts and dimensions (in a dimensional model) or lots of entities in a 3NF model. DW systems usually manage multiple lines of business and attempt to combine that data.
DW systems typically have reporting abilities in SQL which allows you to access all the data in a standard way. NoSQL systems are typically more code-based - for instance Map/Reduce.