If you're asking about the MapReduce architecture, then it is very much just a divide and conquer technique. However, any useful MapReduce architecture will have mountains of other infrastructure in place to efficiently "divide", "conquer", and finally "reduce" the problem set. With a large MapReduce deployment (1000's of compute nodes) these steps to partition the work, compute something, and then finally collect all results is non-trivial. Things like load balancing, dead node detection, saving interim state (for long running problems), are hard problems by themselves.
If you look at your code for swapping you:
// If current element is lower than pivot
// then swap it with the element at store_index
// and move the store_index to the right.
But, ~50% of the time that string you just swapped needs to be moved back, which is why faster merge sorts work from both ends at the same time.
Next if you check to see if the first and last elements are the same before doing each of the recursive call you avoid wasting time calling a function only to quickly exit it. This happens 10000000 in your final test which does add noticeable amounts of time.
Use,
if (pivot_index -1 > start)
quick_sort(lines, start, pivot_index - 1);
if (pivot_index + 1 < end)
quick_sort(lines, pivot_index + 1, end);
You still want an outer function to do an initial if (start < end) but that only needs to happen once so that function can just call an unsafe version of your code without that outer comparison.
Also, picking a random pivot tends to avoid N^2 worst case results, but it's probably not a big deal with your random data set.
Finally, the hidden problem is QuickSort is comparing strings in ever smaller buckets that are ever closer together,
(Edit: So, AAAAA, AAAAB, AAAAC, AAAAD then AAAAA, AAAAB. So, strcmp needs to step though a lot of A's before looking the useful parts of the strings.)
but with Merge sort you look at the smallest buckets first while they are vary random. Mergsorts final passes do compare a lot of strings close to each other, but it's less of an issue then. One way to make Quick sorts faster for strings is to compare the first digits of the outer strings and if there the same ignore them when doing the inner comparisons, but you have to be careful that all strings have enough digits that your not skipping past the null terminator.
Best Answer
That is not the essence of divide-and-conquer algorithms. Usually the point is that the algorithms cannot "deal with the whole data set" at all. Instead, it is divided into pieces that are trivial to solve (like sorting two numbers), then those are solved trivially and the results recombined in a way that yields a solution for the full data set.
Mainly because splitting it into more than two parts and recombining more than two results results in a more complex implementation but doesn't change the fundamental (Big O) characteristic of the algorithm - the difference is a constant factor, and may result in a slowdown if the division and recombination of more than 2 subsets creates additional overhead.
For example, if you do a 3-way merge sort, then in the recombination phase you now have to find the biggest of 3 elements for every element, which requires 2 comparisons instead of 1, so you'll do twice as many comparisons overall. In exchange, you reduce the recursion depth by a factor of ln(2)/ln(3) == 0.63, so you have 37% fewer swaps, but 2*0.63 == 26% more comparisons (and memory accesses). Whether that is good or bad depends on which is more expensive in your hardware.
Apparently a dual pivot variant of quicksort can be proven to require the same number of comparisons but on average 20% fewer swaps, so it's a net gain.
These days hardly anyone programs their own sorting algorithms anymore; they use one provided by a library. For example, the Java 7 API actually uses the dual-pivot quicksort.
People who actually do program their own sorting algorithm for some reason will tend to stick to the simple 2-way variant because less potential for errors beats 20% better performance most of the time. Remember: by far the most important performance improvement is when the code goes from "not working" to "working".