The best way to keep track of the median

algorithmsmathperformance

I read a question, and I'm looking for input on how to solve it:

Numbers are randomly generated and stored into an (expanding) array, How would you keep track of the median?

There are two data structures can solve the problem. One is the balanced binary tree, the other is two heaps which keep trace of the biggest half and the smallest half of the elements.
I think these two solutions have the same running time as O(n lg n), but I am not sure of my judgement.

What is the best way to keep track of the median?

My attempt:

In this question,I think a heap is the best way to keep track of the median. There are two heaps, the big heap and the small heap, which need not to be sequential. First, we calculate the mean value of the elements in the array. If the element is less than the mean value, we put the num to the small heap. On the contrary, we put the num to the big heap. If the number of the big heap is equal to the number of the small heap, the biggest one in the small heap and the smallest one in the big heap are the median. If the two heaps have different size, we just pop the root element from the heap with bigger size and push it to the root of the smaller size heap. For big heap, the root element is the smallest one, and for small heap, the root element is the biggest one. In this way, if the two heaps have the same size or a digital difference, we find the medium in the root.

I think this solution has the running time as O(m*n), m means the times we adjust the unbalance heaps.

Is this the best way to keep track of the median?

Best Answer

There's probably more than 2 data structures that solve this problem. Take a look at Approximate Medians and other Quantiles in One Pass and with Limited Memory

They don't use two heaps. I imagine you could modify their algorithm to periodically get a running approximate value of median. How good an approximationwould of course, depend on many factors, not the least of which is how much data has passed through the algorithm.

Related Topic