Internet Routing – What Does Path Finding Do and How Is It Different from A*?

algorithmsinternetrouting

Note: If you don't understand this question then feel free to ask clarification in the comments instead of voting down, it might be that this question needs some more work at the moment. I've been directed here from the Stack Excange chat room Root Access because my question didn't fit on Super User.


In many aspects path finding algorithms like A star are very similar to internet routing. For example:

A node in an A* path finding system can search for a path though edges between other nodes.

A router that's part of the internet can search for a route though cables between other routers.

In the case of A*, open and closed lists are kept by the system as a whole, sepratly from any individual node as well as each node being able to temporarily store a state involving several numbers.

Routers on the internet seem to have remarkable properties, as I understand it:

They are very performant.

New nodes can be added at any time that use a free address from a finite (not tree like) address space.

It's real routing, like A*, there's never any doubling back for example.

Similar IP addresses don't have to be geographically nearby.

The network reacts quickly to changes to the networks shape, for example if a line is down.

Routers share information and it takes time for new IP's to be registered everywhere, but presumably every router doesn't have to store a list of all the addresses each of it's directions leads most directly to.

I'm looking for a basic, general, high level description of the algorithms workings from the point of view of an individual router. Does anyone have one?

I presume public internet routers don't use A* as the overheads would be to large, and scale to poorly. I also presume there is a single method worldwide because it seems as if must involve a lot of transferring data to update and communicate a reasonable amount of state between neighboring routers.

For example, perhaps the amount of data that needs to be stored in each router scales logarithmically with the number of routers that exist worldwide, the detail and reliability of the routing is reduced over increasing distances, there is increasing backtracking involved in parts of the network that are less geographically uniform or maybe each router really does perform an A* style search, temporarily maintaining open and closed lists when a packet arrives.

Best Answer

The main difference between the algorithms actually used for Internet routing, and the A* algorithm, is that

  • The A* algorithm requires all the information about a network -- knowledge of every node and exactly how the nodes are connected -- on a single machine, and that machine finds the actual best complete path.
  • packet routing algorithms running on a router are "distributed algorithms" that use various heuristics to get "good" (but rarely the "best") results with very limited amounts of information at each router and very limited amounts of time at each router. Each router does a little bit of the work -- it chooses the next step in the path. None of the routers ever calculate a "complete path", and the actual path traveled by a packet is rarely the best.

Since A* finds the best path, why do we use other algorithms that give us suboptimal paths? The A* algorithm is impossible on the Internet for several reasons. There are now so many nodes on the Internet that it is impractical for a router to have enough RAM to store the IP address of every node and which nodes are connected to it. Even if every router did have plenty of RAM, when a new router is turned on or a new link is installed, we expect that new path to start handling packets more-or-less immediately, rather than waiting for that router to download the connection diagram of the entire internet and waiting for every other router on the planet to update its connection diagram. Many routers have packets coming in so quickly that there's simply not enough time to run a full A* algorithm on each packet. In some cases, changes in the network that affect the path of a packet can happen while the packet is in-flight, after the packet has been sent.

I hear that the simplest possible routing protocol -- "hot potato routing" -- works surprisingly well in small internetworks: Each router keeps track of the IP address of each end-node directly connected to it. If the router receives a packet with one of those as the destination address, then the router sends it to that end-node, and that packet's journey is complete. Otherwise, a router using "hot potato routing" randomly sends the incoming packet to any router directly connected to it.

I hear that the Border Gateway Protocol is currently the most popular routing protocol on the internet. The RFC4271 (and its errata) give detailed information that, in principle, is enough to build a interoperable router.

Such protocols typically use some combination of two techniques to allow each router to build its own routing table:

  • Passive: When a packet from some IP address A comes in on port X, and later some packet with a destination of the same IP address A comes in on some other port, the router knows it should send that second packet out port X. The number of unique addresses associated with a port will eventually become too many to store all of them, so the router uses route summarization. Route summarization relies on the fact that geographically nearby machines do, in fact, typically have similar IP addresses, and IP addresses are handed out in ways that typically form a hierarchial tree of geographical subnets.
  • Active: The router sends out its own IP address, and (a summary of) the IP addresses of the machines directly connected to it, to neighboring routers, typically using the routing information protocol.