Find the min/max length of directed graph

algorithmsgraph

The problem is simple on paper… but a bit harder when it comes to write the algorithm to solve it.

Let's use the following graph:

enter image description here

First part

This graph has an entry point A and two possible exits B and J. I'm looking for a way to find the minimum and maximum number of nodes needed to go from entry to an exit (the plan is more to find the numbers from ANY node to an exit, but let's go from start to end for now).

In the graph above, the min length is obviously 1 (A -> B). The max distance is trickier to find because of node E which is linked to a "previous" node C and can create an infinite loop. If we exclude E the max length is 6 (A -> C -> D -> F -> G -> I -> B).

However if we keep E what's the best strategy here to avoid infinite loops?

  • Tag links when visited to avoid crossing a link twice?
    • it won't be possible to move after C again when coming from E
  • Set a high number to infinite loop (something like: exit when you have crossed this section more than 1000 times)?
    • every graph containing a tiny loop will have a max length of 1000 which is not relevant

What is the way to go? How to find the min / max distance the most relevant way possible?

Second part

Ultimately the goal is to get the min / max distance from any node of the graph (starting from C, G, or E for instance).

Also some conditions can be set to define which node can be crossed / linked to a given node. For example when you're on E, according to certain values and variable states, only G may be available…

Thanks for your help.

Best Answer

For the shortest path, start at the well known Dijkstra's algorithm.


When you're on E, according to certain values and variable states, only G may be available...

A simple solution - which I want to encourage you to implement - is to implement dijkstra such that it calls a function to get the outgoing links from a given node. You can then do your checks there. When the conditions change, you would then invalidate the results of the search, and execute it again.

Note: It is even better if you can mark nodes as invalidated. That way the algorithm can cache the lists of links and only reevaluate for the invalidated nodes.

That a given link or node could be unavailable is not really a problem if the graph cannot change while it is being used. You simply compute it for the graph as you have it.

Since this is a concern, I suppose that there is something that will be traversing the graph and it could find that it changed from one moment to the other... when that happens, this something has advanced some distance on the graph, and it makes sense to compute the path from the current position to the target on the modified graph.


I am not sure how do you want to define longest path...

Perhaps, a good definition for longest path is: the shortest path that maximizes the number of visited nodes. Under that definition, walking an infinite loop is not beneficial.

We would searching for the shortest path... but we need a new metric to call "distance". My initial intuition is to use a metric that does not increase when we visit a new node.

If the distance does not increase nor decrease when visiting a new node, then the following paths are the same length:

A -> B
A -> C -> F -> G -> I -> B

However, we want the second path to be better (I mean, "shorter", I mean, to have a lower value in the metric, I mean, to be preferible).

A simple solution is to decrease the metric when visiting a new node.

Notes:

  • A negative metric can be problematic. For path finding, it is not safe to use Dijkstra with negatives. We can normalize it to only positive values by using a sigmoid function before comparing.
  • We also need to encode the rule that distance is different depending on whatever or not we have visited a node. You can solve this by similar means to what I described for having parts of the graph unavailable. Except, instead of passing the current node to evaluate the available links, you pass the current path.

With this metric we have:

A B - length: -1

A C D F G I B - length: -6

A C D E G I B - length: -6

A C D E B - length: -4

A C D E J - length: -4

A C D E C D F G I B - length: -5
        ¯ ¯
A C D E C D E G I B - length: -3
        ¯ ¯ ¯
A C D E C D E B - length: -1
        ¯ ¯ ¯
A C D E C D E J - length: -1
        ¯ ¯ ¯
A C D E C D E C D E F G I B - length: -1
        ¯ ¯ ¯ ¯ ¯ ¯

Then our "longest" path is one of the two longer paths that does not loop (A C D F G I B or A C D E G I B).

In case it is not evident, a path with a loop is always considered worse than its variant without loops because the loop traverses visited nodes, and thus increases the metric. And remeber that we are minizming the metric.

If you are using Dijkstra with a sigmoid of this metric - as I suggest - the loops will be discarded because - as explained above - they will always yield worst paths that those without loop.

Addendum: After posting the answer, I notice that I got some interesting results... for instance: A C D E C D F G I B manages to visit more nodes than A C D F G I B but scores worse (higher), that is not what I expected. I suppose we can tweak it to be more or less interested in going out of its way to reach more nodes before going to the target by applying factors to the added or decreased distance... for example, we could increase only half the value when traversing a visited node. What is the ideal proportion to increase? I have no idea. For practical purposes it can be tweaked to suit the design or problem domain.