Clever way to calculate a mode of an online series without storing the series

algorithms

My program's aim is to determine the correct time zone offset (GMT+k) as a series of numbers are fed to it. eg:

-4
+11
1
-4
-4

So, here -4 (the mode of the series) is the correct value. But this requires storing the series then analyzing it and I do not have enough storage (RAM and FLASH) for this. The running average can be calculated easily by storing the list size and the sum. Is there a similar case for finding the mode?

Best Answer

To calculate the mode from an unreasonably large data set (or stream) containing only a finite number of discrete elements (such as integers in a small range), use a table to count the occurrence of each element, where the element in the data set is the key in our table. We then go through the data set and increment that element's counter. At any point we can then determine the mode of the processed part of the data set.

Pseudocode example

 // all elements are integers in range a to b
 a = ...
 b = ...
 table = new Array(size: b - a, default: 0)

 // add items to the table
 for item in stream {
     table[item]++
 }

 // find the mode by sorting the table keys by their values
 mode = range(a, b).sortBy(item => table[item])

Instead of an array, a general purpose Map type (e.g. a hash map) will be needed if the elements aren't integers.