Algorithms – Generating Sort Keys When Reordering Items

algorithms

We have a number of items which the end user will be able to organize into a desired order. The set of items is unordered, but each item contains a sort key which can be modified.

We're looking for an algorithm that would allow generating a new sort key for an item that is added or moved to be either the first item, last item, or between any two items. We're hoping to only have to modify the sort key of the item being moved.

An example algorithm would be to have each sort key be a floating point number, and when placing an item between two items, set the sort key to be their average. Placing an item first or last would take the outermost value +- 1.

The problem here is that floating point precision could cause the sorting to fail. Using two integers to represent a fractional number could similarly have the numbers become so large that they couldn't be accurately represented in regular numerical types (e.g. when transferring as JSON). We wouldn't want to use BigInts.

Is there a suitable algorithm for this that would work, for example, using strings, which wouldn't be affected by these shortcomings?

We're not looking to support huge numbers of moves, but the algorithm described above could fail on a double-precision floating points number after about 50 moves.

Best Answer

As a summary of all comments and answers:

TL;DR - Using double-precision floating point numbers with the initially proposed algorithm should be sufficient for most practical (at least manually-ordered) needs. Maintaining a separate ordered list of the elements should be considered as well. Other sort key solutions are somewhat cumbersome.

The two problematic operations are inserting items at the beginning / end over and over again, and repeatedly inserting or moving items to the same spot (e.g. with three elements repeatedly moving the third element between the first two, or repeatedly adding new elements as the second element).

From a theoretical point of view (i.e. allowing infinite reordering), the only solution I can think of is using two unlimited size integers as a fractional a/b. This allows infinite precision for the mid-inserts, but the numbers can become increasingly large.

Strings may be able to support a large number of updates (though I'm still having trouble figuring out the algorithm for both operations), but not infinite, because you can not add infinitely many at the first position (at least using regular string sort comparison).

Integers would require choosing an initial spacing for the sort keys, which limits how many mid-inserts you can perform. If you initially space sort keys 1024 apart, you can only perform 10 worst-case mid-inserts before you have adjacent numbers. Choosing a larger initial spacing limits how many first / last inserts you can perform. Using a 64-bit integer, you are limited to ~63 operations either way, which you need to split up between mid-inserts and first/last inserts a priori.

Using floating point values removes the need for selecting spacing a priori. The algorithm is simple:

The first element inserted has a sort key 0.0
An element inserted (or moved) first or last has the sort key of the first element - 1.0 or last element + 1.0, respectively.
An element inserted (or moved) between two element has a sort key equal to the average of the two.

Using a double-precision float allows 52 worst-case mid-inserts and effectively infinite (about 1e15) first/last inserts. In practice when moving items around the algorithm should self-correct itself, as every time you move an item first or last it expands the range that can be used.

Double-precision floats also have the benefit that they are supported by all platforms and easily stored and transported by practically all transport formats and libraries. This was what we ended up using.

Related Solutions

Algorithms – Name of Algorithm for Converting Strings to Numbers

This is an application of Horner's method for evaluating polynomials. It is based on the observation that a number abcdef in a positional numeral system with base k is the polynomial a*x^5+b*x^4+...+f*x^0 evaluated at x=k.

As to why there is an asymmetry between the integer and the fractional parts, this simply mirrors the asymmetry between the exponents 0, 1, 2, ... on the one side and -1, -2, ... without the zero on the other.

Algorithm to update priority ordered list where changing priority of an item is expensive

If you're stuck with your list data structure

If you strive purely to minimize number of priority updates, there's not too much you can do beyond the binary-search approach.

Otherwise, one possible smallish improvement I can think of is, whenever you insert an item C between B and D, check the item A to the B and E to the right of D and redistribute their priorities.

So for something like:

        C
        ↓
A B           D E
0 1           7 8

You'd end up with:

A   B   C   D   E
0   2   4   6   8

Note that A and E don't change, their priorities are just required in the calculation.

Or you can leave B where it is and only update D, or vice versa, depending on whether A and D or B and E are further apart.

If you can pick another data structure

It would be better to pick one where the priority is structurally defined, not explicitly for each element.

A modified balanced binary search tree could work here, where you store the number of ancestors for each node.

The comparison for insertion would then be defined purely on where you want to insert it compared with the counts of each node.

Note that something like a red-black tree doesn't require any comparisons to keep itself balanced.

Update, insert and delete operations will all take O(log n).

Depending on your requirements, you can have a secondary data structure that allows you to look up some node for a given element.

Best Answer

Related Solutions

Algorithms – Name of Algorithm for Converting Strings to Numbers

Algorithm to update priority ordered list where changing priority of an item is expensive

Related Topic