How to this deterministic linear time selection algorithm be linear

algorithmssorting

I'm trying to understand the basic concepts of algorithms through the classes offered at Coursera (in bits and pieces), I came across the deterministic linear time selection algorithm that works as follows:

  • Select(A,n,i)
    1. If n = 1 return A[1].
    2. p = ChoosePivot(A, n)
    3. B = Partition(A, n, p)
    4. Suppose p is the jth element of B (i.e., the jth order statistic of A). Let the “first part of B” denote
      its first j − 1 elements and the “second part” its last n − j elements.

      • If i = j, return p.
      • If i < j, return Select(1st part of B, j − 1, i).
      • Else return Select(2nd part of B, n − j, i − j).

And sorts the array internally in the ChoosePivot subroutine to calculate the median of median using a comparison based sorting algorithm. But isnt the lower bound on comparison based sorting O(nlogn)? So how would it be possible for us to acheive O(n) for the entire algorithm then?
Am I missing something here?

Best Answer

And sorts the array internally in the ChoosePivot subroutine to calculate the median of median using a comparison based sorting algorithm.

This is the incorrect premise (you are right that if we did do a comparison-based sort, we couldn't be linear). The point of median-of-medians is that it is linear - you don't need to sort the whole set if all you want is the median. See proof of O(n) running time on wikipedia for more detail.

Related Topic