If you look at your code for swapping you:
// If current element is lower than pivot
// then swap it with the element at store_index
// and move the store_index to the right.
But, ~50% of the time that string you just swapped needs to be moved back, which is why faster merge sorts work from both ends at the same time.
Next if you check to see if the first and last elements are the same before doing each of the recursive call you avoid wasting time calling a function only to quickly exit it. This happens 10000000 in your final test which does add noticeable amounts of time.
Use,
if (pivot_index -1 > start)
quick_sort(lines, start, pivot_index - 1);
if (pivot_index + 1 < end)
quick_sort(lines, pivot_index + 1, end);
You still want an outer function to do an initial if (start < end) but that only needs to happen once so that function can just call an unsafe version of your code without that outer comparison.
Also, picking a random pivot tends to avoid N^2 worst case results, but it's probably not a big deal with your random data set.
Finally, the hidden problem is QuickSort is comparing strings in ever smaller buckets that are ever closer together,
(Edit: So, AAAAA, AAAAB, AAAAC, AAAAD then AAAAA, AAAAB. So, strcmp needs to step though a lot of A's before looking the useful parts of the strings.)
but with Merge sort you look at the smallest buckets first while they are vary random. Mergsorts final passes do compare a lot of strings close to each other, but it's less of an issue then. One way to make Quick sorts faster for strings is to compare the first digits of the outer strings and if there the same ignore them when doing the inner comparisons, but you have to be careful that all strings have enough digits that your not skipping past the null terminator.
Expected time is just the average, expected, running time of the algorithm using the intended input.
Say you've got some few million user records and want to sort them, you might want to use an algorithm which is the most suitable for your input, and as such gives the best expected running time, as opposed to an algorithm which has better worst-case running time but worse expected running time.
Sometimes for example the constant factors for the time-complexity of an algorithm are so high that it makes sense to use algorithms with worse time-complexity but smaller constant factors, as it gives you better expected running time with small input, even though it would get horribly outperformed with bigger input.
Perhaps a better example would be the classic quicksort algorithm, which has worst-case running time of O(n²) but expected average running time of O(n log n), regardless of input. This is because the algorithm uses(or rather, may use, depending on the implementation) randomization. So it's a so-called randomized algorithm. It runs a bit differently with every invocation even with the same input. As such, thre's no universal worst-case input for the implementation, because the worst-case input depends on the way the algorithm chooses the pivot for dividing the given input. And as such, one can't just supply some pre-defined input causing the worst-case running time. This is often the case with randomized algorithms, which aim for better expected, average running time regardless of the input.
It's all about using the right algorithm for the input at hand.
Best Answer
Your quote is saying that if the range of integers k is linearly proportional to the total number of elements in the collection n, then a counting sort will have a best-case and a worst-case time complexity of O(n), making it a very efficient sort in such a case.
Big Oh (and others in the Bachman-Landau family of notations) are just simplifications of the precise complexity of the function in any given case, used to illustrate the increasing value of the function as N grows large. In programming, Big Oh is usually used in the context of time complexity; for n values, a function f(n) will execute in a time on the order of g(n), and thus we say it has a Big Oh complexity of O(g(n)). However the mathematical construct of Big Oh isn't so limited. In this particular case, it is being used to refer to the general relationship between k and N as numbers.
Simply put, when k (the range of the values of an n-element collection) increases as n does on the order of O(n), then the counting sort will be efficient. This will be true if, say, the list contained all multiples of 4 between 1 and x (in which case k~=x, and n=x/4, meaning that k ~= 4n = O(n)). The condition k=O(n) would be false for, say, the set of all perfect squares from 1 to x2 (in which case k=x2 but n=x, so k = n2 = O(n2)). In that case, k increases on the square (O(n2)) when N increases linearly (O(n)), so the counting sort would execute in the more complex time (which would be O(n2)), which would perform worse than some other sort implementation that ran in Theta(nlogn).