As deterministic as computers themselves are, computer engineering is not an exact science. Two people, given the same problem domain, will perform an analysis and develop two different solutions that satisfy all constraints of the problem. It may be difficult or impossible to empirically determine which of these is "better" in the general case.
My guess is that the .NET QuickSort is layered on top of something in the MFCs or Windows API, and is probably inherited from much older versions of Windows where the multi-threadable advantage of MergeSort wouldn't have even been considered for the computers of the day. (EDIT: it's not, though Microsoft developers have been QuickSort fanboys for a long time, as evidenced by this choice of sorting implementation since MS-DOS).
Java, which can't use any platform-specific implementation because Java was designed from scratch to be completely platform-independent, went a different way. Who knows why MergeSort came out on top; my wild guess is that the implementation won some sort of performance competition versus some other sorts the developers came up with, or else that an O(n)-space MergeSort just looked best on paper in terms of best-case and worst-case performance (MergeSort has no Achilles' heel relating to element selection like QuickSort, and its best-case is a near-sorted list while that's often QuickSort's worst). I doubt multithreading benefits were initially considered, but the current implementation may well be multithreaded.
If you look at your code for swapping you:
// If current element is lower than pivot
// then swap it with the element at store_index
// and move the store_index to the right.
But, ~50% of the time that string you just swapped needs to be moved back, which is why faster merge sorts work from both ends at the same time.
Next if you check to see if the first and last elements are the same before doing each of the recursive call you avoid wasting time calling a function only to quickly exit it. This happens 10000000 in your final test which does add noticeable amounts of time.
Use,
if (pivot_index -1 > start)
quick_sort(lines, start, pivot_index - 1);
if (pivot_index + 1 < end)
quick_sort(lines, pivot_index + 1, end);
You still want an outer function to do an initial if (start < end) but that only needs to happen once so that function can just call an unsafe version of your code without that outer comparison.
Also, picking a random pivot tends to avoid N^2 worst case results, but it's probably not a big deal with your random data set.
Finally, the hidden problem is QuickSort is comparing strings in ever smaller buckets that are ever closer together,
(Edit: So, AAAAA, AAAAB, AAAAC, AAAAD then AAAAA, AAAAB. So, strcmp needs to step though a lot of A's before looking the useful parts of the strings.)
but with Merge sort you look at the smallest buckets first while they are vary random. Mergsorts final passes do compare a lot of strings close to each other, but it's less of an issue then. One way to make Quick sorts faster for strings is to compare the first digits of the outer strings and if there the same ignore them when doing the inner comparisons, but you have to be careful that all strings have enough digits that your not skipping past the null terminator.
Best Answer
"performance guarantee" is not a term typically used in analyzing algorithms.
Time complexity is measured in terms of whatever you want. You're stumbled on a dirty secret of CS - Big-O notation is often used rather sloppily without specifying what exactly increases with the input size in that manner. It's generally the number of some primitive operation that's assumed to dominate others in the implementation, and assumed to take constant time itself. Both of these assumptions are often not universally true. For example, for sorting algorithms time complexity is usually based on the number of comparisons. But comparing numbers is actually not a constant time operation itself if the numbers get realy big