NoSql and Data-Warehouse

data-warehousenosql

What are the relations between NoSql and Data-Warehouse technologies/theories?

What concepts they share?

What are the basic differences between them?

How do you think each could be benefits/enriches from the other?

I think your ideas should be helpful for the future of both technologies.

UPDATE:

Some useful links:

Integrating NoSQL in the Data Warehouse
NoSQL and Data Warehousing
Are You Ready for Big Data?

2nd UPDATE:

MongoDB, BI and Non-Relational Databases

Best Answer

Data Warehouses have very little in common with NoSQL - the main similarity is that any two data warehouses can have very different philosopohies or conventions just like any two NoSQL systems can be nearly unrelated.

The only concept they share is that they are both used to analyze large amounts of data.

NoSQL solutions usually manage relatively limited schemas with large cardinality in few entities, while data warehouses typically have lots of facts and dimensions (in a dimensional model) or lots of entities in a 3NF model. DW systems usually manage multiple lines of business and attempt to combine that data.

DW systems typically have reporting abilities in SQL which allows you to access all the data in a standard way. NoSQL systems are typically more code-based - for instance Map/Reduce.

Related Solutions

R – Object Database, business intelligence and warehousing

Multidimensionality is an essential feature of data warehousing.

It is not a "workaround" for limitations of the relational model. It is the best way to model data where you need to do arbitrary "slice and dice" analysis of facts with respect to multiple non-trivial dimensions.

Star-schema queries are not very complex. They're actually very simple, since they're almost always of the form SELECT SUM(MEASURE) FROM FACT JOIN DIM1 ON ... JOIN DIM2 ON ... WHERE....

Join operations are -- generally -- slow. However, the joins can be done in object-oriented code instead of a SQL warehouse.

In many cases, most dimensions are actually rather small and fit entirely in memory. The analysis queries can devolve to simple fetches of all the facts followed by in-memory lookups of dimensional attributes.

In the remaining cases, we have a snowflake schema where a dimension (usually a customer, patient or member dimension) is almost as large as the relevant fact table. In this case, a relational join in the database is helpful.

"The object databases don't have joins because relations between objects are maintained by direct references."

Isn't completely true. Object databases have navigation from object to object. If you retrieve a set of objects along with their related objects, a join operation will -- in effect -- have been performed.

"The question is do we need the same concepts (multidimensional database - data warehouse, etc) when we talk about business intelligence for object database?"

Yes. Multidimensionality is essential. Absolutely. An object database will represent this just as well (or perhaps better) than a relational database. Either model, however, must represent the essential truth of Measures and their Dimensions.

Data Warehouse Considerations: When and Why?

I'll see if I can do my best to answer your questions succinctly.

1.At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?

a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.

b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.

c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.

d. If you want to bring together data from multiple sources.

2.What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?

3.Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?

I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."

Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.

4.When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?

Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.

If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.

5.Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.

That's a big question that would take far more space than I'm allotted here.

On this one, I can point you to a few places that might provide the insight you seek.

"Implementing A Data Warehouse: A Methodology that worked" by Bruce Ullrey is a book documenting one man's journey to building a data warehouse. It's not highly polished, which gives it more realism. It reads like a journal with lots of models and other visuals that illustrate his efforts pretty well.
"Business Intelligence Roadmap" by Larissa Moss. Standard fare. Walks you through the process of building a BI practice at a high level.
"The Profit Impact of Business Intelligence" by Steve Williams gives a number of case studies that show the value of building data warehouses.

Best Answer

Related Solutions

R – Object Database, business intelligence and warehousing

Data Warehouse Considerations: When and Why?

Related Topic