Java – About insertion sort and especially why it’s said that copy is much faster than swap

algorithmsjavasorting

From Lafore's "Data Structures and Algorithms in Java":
(about insertion sort (which uses copy + shift instead of swap (used in bubble and selection sort)))

However, a copy isn’t as time-consuming as a swap, so for random data
this algo- rithm runs twice as fast as the bubble sort and faster than
the selectionsort.

Additionally, the author doesn't mention how time consuming shift is.

It seems that copying is the simplest pointer assignment operation. While swap is 3 pointer assignment operations, which don't take much time. Further, shift of N elements is N pointer assignment operations.

Can someone explain this seeming inconsistency?

Best Answer

a copy isn’t as time-consuming as a swap

Let's compare a copy to a swap:

int[] a = ...
int idx = ...

// copy:
a[idx] = a[idx + 1];

// swap:
int temp = a[idx];
a[idx] = a[idx + 1];
a[idx + 1] = temp;

Swap allocates a temporary variable performs and three copies - one from the array to the temporary variable, one from one index at the array to another, and one from the temporary variable back to the array. Therefore a swap is at least 3 times as expensive as a copy.

Shifting elements in an array (as opposed to shifting bits in a primitive integer type) is expensive because it is repeated copying:

for (int i = shiftEnd - 1; i >= shiftStart; i--) {
  a[i + 1] = a[i];
}

You can see here that when (shiftEnd - shiftStart) > 3 a shift will perform more copy operations than a swap.

Let's look at the same example with an array of pointers to objects instead of primitive ints:

Object[] a = ...

// copy the pointer to an object from one array element to another
a[idx] = a[idx + 1]; 

// swap the pointer to an object between array elements
Object temp = a[idx];
a[idx] = a[idx + 1];
a[idx + 1] = temp;

You will notice that these operations are exactly the same whether dealing with primitives or objects. The List version is more expensive due to the method-call overhead of accessing the list. Most List implementations use multiple arrays to allow resizing of the list which is more expensive than using a single, fixed-length primitive array.