Fully Stable, In-Place Unique Algorithm in O(n)

algorithmssorting

Is there an algorithm that, given a sorted array, swaps all the first unique elements to the beginning of the array and the duplicates to the end, while remaining stable for both the unique subarray and the duplicate subarray, and that runs in O(n) swaps (and preferably one pass)? It should return the length of the unique portion of the array.

Contrived example input/result (Using python because it's easy to read):

>>> A = [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]   # already sorted
>>> r = Unique(A)
>>> r
5
>>> A
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> A[:r] # the sorted portion
[1, 2, 3, 4, 5]
>>> A[r:] # the duplicates portion
[1, 2, 3, 4, 5]

Stability means that even if two or more keys compare equal, the ordering of the keys in the result persists. E.g. if A[i] == A[j] == A[k] and i < j < k in the original array, all of those properties hold in the whole array after running the Unique algorithm. (Even though A[i] might be in the unique subarray and both A[j] & A[k] are in the duplicates subarray.)

My failed attempt is a one-pass that tracks the current unique element while iterating through the array. The next unique element is swapped with the element after the end of the current unique subarray:

def Unique1(A):
    if len(A) <= 1:
        return 0
    i = 0
    for j in range(1, len(A)):
        if A[i] < A[j]:
            i += 1
            if i < j:
                A[i], A[j] = A[j], A[i]
    return i + 1

It swaps the first unique elements of a sorted array to the beginning of the array, runs in one pass (so is O(n)), and is stable for the unique subarray, but this does not satisfy the requirements because it scrambles the duplicates subarray:

>>> A = [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
>>> r = Unique1(A)
>>> A[r:]          # duplicates part
[3, 2, 4, 1, 5]    # not sorted

While the changes the algorithm makes to the unique subarray is somewhat obvious, I can't "see" what's happening to the duplicates subarray, but I have a feeling that it could be reversed if we knew or stored more information about the duplicates.

Note, just re-sorting the duplicates subarray is a non-starter because 1: it's O(n log n) not O(n) and 2: it breaks the stability.

I can also think of another algorithm that swaps newly found unique items all the way down to the next position. While it would satisfy the other conditions, it would have O(n^2) swaps.

Is such a unique algorithm possible? And if not, why not?

Best Answer

I'm not sure, but I think you need to drop the "preferably one-pass" from your description. You can do it in three passes by using the following algorithm:

  • Take a pass over the array, counting the number of unique items as unique_count
  • Allocate a new temporary array with size len(A) - unique_count
  • Take a second pass over the array, shifting each first instance of an item to the front of the array while copying the second instance to the first free space in the temporary array
  • Take a pass over the temporary array, copying each item to its final position in the input array

This is still O(n), albeit with somewhat higher constant factor per item than your original implementation.