Finding the shortest path in a fully-connected undirected graph

graphgraph-traversal

I am trying to solve a problem in which I have a list of two-dimensional coordinates, and I want to find the shortest path that connects all of them.

At first I thought this was a case of the Traveling Salesman Problem, however:

  • You don't need to get back to the starting city, which I believe TSP wants you to.
  • On this two-dimensional plane where we work with the euclidean distance metric, the triangle inequality holds, which the normal TSP algorithms don't care about, if I recall correctly.
  • There is no 'you only can visit a node once' rule in this problem; the 'shortest path' could form a tree.

I then thought to 'just make a graph and use Prim's or Kruskal's algorithm to find the (length of the) minimum spanning tree'. However, the graph representations commonly used are either an adjacency matrix, which seems a waste for an undirected graph, or an adjacency list, which is slower for a sparse graph (and a fully-connected graph is of course the exact opposite of sparse).

I have the feeling that I am discarding information this specific problem has, which will result in using more time and/or memory that would be required to solve the shortest connecting path problem for a fully-connected, undirected graph in 2d-space.

So:

  • What would be the most efficient practical data representation to store this graph in?
  • What would be the most efficient algorithm to find the length of the shortest path connecting all nodes?

Best Answer

For storing the complete graph use an adjacency matrix because you're going to need to store the distance between every two node [n*(n+1)/2 edges for n nodes]. You'd need to insert n*(n+1) edges if you use an adjacency list.

For storing the minimum spanning tree (MST) use an adjacency list since you only need to store only n-1 edges for a graph of n nodes

For finding the MST you're better off with Prim's algorithm. It works in O(E log(V)) when used with adjacency list. [E edges, V nodes]

Kruskal's algorithm needs to sort all the edges first which in this case is not a good idea because you have a lot of edges.