Sorting Methods – Why Sort by 1, 10, 2, 3…?

sorting

I've noticed than many numerical sorting methods seem to sort by 1, 10, 2, 3… rather than the expected 1, 2, 3, 10… I'm having trouble coming up with a scenario where I would need the first method and, as a user, I get frustrated whenever I see it in practice. Are there legitimate use cases for the first style over the second? If so, what are they? If not, how did the first sort style ever come into being? What are the official names for each sort method?

Best Answer

that is lexicographic sorting which means basically the language treats the variables as strings and compares character by character ("200" is greater than "19999" because '2' is greater than '1')

to fix this you can

  • ensure that the values are treated as integers,

  • prepend '0' to the strings so all have equal lengths (only viable when you know the max value).
    This is why you'll see episode numberings on media files (S1E01) with a prepended 0 so a lexicographic sort doesn't mess things up and allows programs to simply play/display in alphabetical order,

  • or make a custom comparator that first compares the length of the strings (shorter strings being smaller integers) and when they are equal compare the lexicographically (careful about leading '0')

Related Topic