I have an array of floats, sorted from smallest to largest, and need to be able to pick out the nearest float greater than or less than a passed input value. This input value is not necessarily present as a value in the array.
A naive approach would be to do a simple linear search through the array. That might look like this:
void FindClosestFloatsInArray( float input, std::vector<float> array,
float *min_out, float *max_out )
{
assert( input >= array[0] && input < array[ array.size()-1 ] );
for( int i = 1; i < array.size(); i++ )
{
if ( array[i] >= input )
{
*min = array[i-1];
*max = array[i];
}
}
}
But obviously as the array gets larger, this will become slower and slower.
Does anyone have an idea about an algorithm that would let me find this data more optimally? I've already switched to a binary search, which has improved matters somewhat, but it's still a lot slower than I'd like, and since I'm not actually looking for a specific value that exists in the array, it can never terminate early.
More information: The floating point values in the array are not necessarily distributed evenly (that is, the array could consist of the values "1.f, 2.f, 3.f, 4.f, 100.f, 1200.f, 1203.f, 1400.f".
I'm doing this operation hundreds of thousands of times, but I can do any amount of pre-processing on the array of floats, if it'll improve the lookup time. I absolutely can change to use something other than a vector to store them, if that'll help.
Best Answer
The code in the question (a linear search), as you rightly point out, is going to get slow for large float arrays. Technically it's O(n) where n is the number of float values in your array.
In general, the best that you can do to find a value in an ordered array is a recursive tree search of some kind (e.g. binary search), in which case you can achieve an O(log n) lookup time in the number of elements in your array. O(log n) is much better than O(n) for large values of n.
My suggested approach would therefore be a simple binary search of the array, i.e.:
This is an O(log n) algorithm which should be fast enough for nearly all situations. Intuitively, it works by halving the range to be searched at each step until you find the correct value.
It's really hard to beast the simple binary search, so if you have already implemented this correctly then you may be pretty close to optimal already. However, if you know the distributions of the data and/or have a limited range of lookup values (x), there are still some other more advanced tricks you can try:
However, unless you are in a very special situation I'd probably recommend sticking with the simple binary search. Reasons: