How to find the closest vector to a given vector

algorithms

Let's say I have several points / vectors (in 2D to keep it simple, but could be of any dimension)

   [x1, y1]
   [x2, y2]
   [x3, y3]
   ....
   [xn, yn]

If I pick some point [x', y'], how do I find the closest point to it?

For a more concrete / practical example, imagine these are coordinates of houses. If I have thousands of houses in the database, I'd love to find the closest house to my house. Or more generally, I'd like to find the K closest houses to my house.

One brute-force way to do this is to cycle through each point and find its distance to your point/house and just pick the smallest one. But with thousands or even millions of data points it's not efficient at all.

Is there a faster algorithm at all? Or am I stuck trying to check each point one at a time?

Best Answer

If you have multiple queries, you can use a spatial datastructure to accelerate them. This typically requires preprocessing your target points, and in any case will cost you some time and space.

There are two common classes of acceleration structures: one uses spatial partitions, and the other uses overlapping regions. The k-D tree is an example of the first, while the R-tree is an example of the second.

Of course, if you have only one query, you can't do better overall than checking each point once. But, if you need the query itself to be fast, then preprocessing to build an acceleration structure can get you there.

Related Topic