Algorithms – Is MapReduce More Than Just an Application of Divide and Conquer?

algorithms

Dividing a problem to smaller ones until the individual problems can be solved independently and then combining them to answer the original question is known as the divide and conquer algorithm design technique. [See: Introduction to Algorithms by CLR]

Recently, this approach to solve computational problems especially in the domain of very large data sets has been referred to as MapReduce rather than divide and conquer.

My question is as follows: Is MapReduce anything more than a proprietary framework that relies on the divide and conquer approach, or are there details to it that make it unique in some respect?

Best Answer

If you're asking about the MapReduce architecture, then it is very much just a divide and conquer technique. However, any useful MapReduce architecture will have mountains of other infrastructure in place to efficiently "divide", "conquer", and finally "reduce" the problem set. With a large MapReduce deployment (1000's of compute nodes) these steps to partition the work, compute something, and then finally collect all results is non-trivial. Things like load balancing, dead node detection, saving interim state (for long running problems), are hard problems by themselves.