Sorting Methods – Why Sort by 1, 10, 2, 3…?

sorting

I've noticed than many numerical sorting methods seem to sort by 1, 10, 2, 3… rather than the expected 1, 2, 3, 10… I'm having trouble coming up with a scenario where I would need the first method and, as a user, I get frustrated whenever I see it in practice. Are there legitimate use cases for the first style over the second? If so, what are they? If not, how did the first sort style ever come into being? What are the official names for each sort method?

Best Answer

that is lexicographic sorting which means basically the language treats the variables as strings and compares character by character ("200" is greater than "19999" because '2' is greater than '1')

to fix this you can

ensure that the values are treated as integers,
prepend '0' to the strings so all have equal lengths (only viable when you know the max value).
This is why you'll see episode numberings on media files (S1E01) with a prepended 0 so a lexicographic sort doesn't mess things up and allows programs to simply play/display in alphabetical order,
or make a custom comparator that first compares the length of the strings (shorter strings being smaller integers) and when they are equal compare the lexicographically (careful about leading '0')

Related Solutions

Merge sort versus quick sort performance

If you look at your code for swapping you:

// If current element is lower than pivot
// then swap it with the element at store_index
// and move the store_index to the right.

But, ~50% of the time that string you just swapped needs to be moved back, which is why faster merge sorts work from both ends at the same time.

Next if you check to see if the first and last elements are the same before doing each of the recursive call you avoid wasting time calling a function only to quickly exit it. This happens 10000000 in your final test which does add noticeable amounts of time.

Use,

if (pivot_index -1 > start) quick_sort(lines, start, pivot_index - 1);

if (pivot_index + 1 < end) quick_sort(lines, pivot_index + 1, end);

You still want an outer function to do an initial if (start < end) but that only needs to happen once so that function can just call an unsafe version of your code without that outer comparison.

Also, picking a random pivot tends to avoid N^2 worst case results, but it's probably not a big deal with your random data set.

Finally, the hidden problem is QuickSort is comparing strings in ever smaller buckets that are ever closer together,

(Edit: So, AAAAA, AAAAB, AAAAC, AAAAD then AAAAA, AAAAB. So, strcmp needs to step though a lot of A's before looking the useful parts of the strings.)

but with Merge sort you look at the smallest buckets first while they are vary random. Mergsorts final passes do compare a lot of strings close to each other, but it's less of an issue then. One way to make Quick sorts faster for strings is to compare the first digits of the outer strings and if there the same ignore them when doing the inner comparisons, but you have to be careful that all strings have enough digits that your not skipping past the null terminator.

Algorithms Naming History Sorting – Why Quicksort is Called Quicksort

In 1962 research on sorting algorithms wasn't as far advanced as today and the computer scientist Tony Hoare found a new algorithm which was quicker than the other so he published a paper called Quicksort and as the paper was quoted the title stayed.

Quoting the abstract:

A description is given of a new method of sorting in the random-access store of a computer. The method compares very favourably with other known methods in speed, in economy of storage, and in ease of programming. Certain refinements of the method, which may be useful in the optimization of inner loops, are described in the second part of the paper.

Best Answer

Related Solutions

Merge sort versus quick sort performance

Algorithms Naming History Sorting – Why Quicksort is Called Quicksort

Related Topic