How to calculate or approximate the median of a list without storing the list

algorithmmedianoptimization

I'm trying to calculate the median of a set of values, but I don't want to store all the values as that could blow memory requirements. Is there a way of calculating or approximating the median without storing and sorting all the individual values?

Ideally I would like to write my code a bit like the following

var medianCalculator = new MedianCalculator();
foreach (var value in SourceData)
{
  medianCalculator.Add(value);
}
Console.WriteLine("The median is: {0}", medianCalculator.Median);

All I need is the actual MedianCalculator code!

Update: Some people have asked if the values I'm trying to calculate the median for have known properties. The answer is yes. One value is in 0.5 increments from about -25 to -0.5. The other is also in 0.5 increments from -120 to -60. I guess this means I can use some form of histogram for each value.

Thanks

Nick

Best Answer

If the values are discrete and the number of distinct values isn't too high, you could just accumulate the number of times each value occurs in a histogram, then find the median from the histogram counts (just add up counts from the top and bottom of the histogram until you reach the middle). Or if they're continuous values, you could distribute them into bins - that wouldn't tell you the exact median but it would give you a range, and if you need to know more precisely you could iterate over the list again, examining only the elements in the central bin.