Graph Algorithms – Finding the Shortest Path

algorithmsgraph

I have a graph with about a billion vertices, each of which is connected to about 100 other vertices at random.

I want to find the length of the shortest path between two points. I don't care about the actual path used.

Notes:

  • Sometimes edges will be severed or added. This happens about 500 times less often than lookups. It's also ok to batch up edge-changes if it lets you get better performance.
  • I can pre-process the graph.
  • If it takes more than 6 steps, you can just come back with infinity.
  • It's acceptable to be wrong 0.01% of the time, but only in returning a length that's too long.
  • All edges have a length of 1.
  • All edges are bidirectional.

I'm looking for an algorithm. Psuedocode, english descriptions, and actual code are all great.

I could use A*, but that seems optimized for pathfinding.
I thought about using Dijkstra's algorithm, but it has a step which requires setting the shortest-path-found attribute of every vertice to infinity

(If you're wondering about the use-case, it's for the Underhanded C Contest.)

Best Answer

Basic Algorithm

Maintain two sets of the nodes you can reach from the start and end node. In alternating fashion, go three steps from both sides. Each time replacing your set with nodes you can reach through one more step. After each step you check the two sets for common nodes.

Optimizations

Make sure you can iterate the sets as sorted so that you can search for common nodes in a single sweep: an O(n+m) operation. Lists will be up to a million nodes each.

To extend a set with one step, you have query all connections of the nodes in the original set and merge them into a new sorted set. Merging 2 sorted lists can again be done in a single sweep. So you also want to make sure that you can query the connections of a node as sorted. (This could be preprocessed).

In the last two steps each new set is the result of merging up to 10000 of these query results. It is best to do this merge adaptive (merging equally sized chunks). In that way, the sorted set data structure can be a simple linked list.

That way the whole algorithm becomes O(6*n + 6*n*log n) where n is max. 1,000,000.