Languages with graph data structures and algorithms in standard library

graphgraph-traversalstandard-library

I am trying to improve my knowledge and ability with graphs and graph algorithms and have noticed something curious: as far as I can tell no "mainstream" language contains support for graphs in its standard library. Trees yes, graphs no. The only language that comes even remotely close is C++ with Boost (which isn't technically a "standard library")

I'm not sure exactly why this is, but it's got me wondering: are there any programming languages out there that come with solid support for graphs as part of their standard library?

Edit: To clarify: by "graph" I mean a network of nodes and edges; the sort of thing you'd run Dijkstra's algorithm on, not a "graph" in the sense of "graphical display of data".

Best Answer

One way of looking at this is that any time you have a data structure that references another data structure, you have some form of object graph. On that level, any object-oriented language supports a graph. Some languages are not object oriented and instead use a map to store related data structures. This also creates graphs of related maps, so any language with an associative array (PHP) or hashMap (Clojure, perl, etc) can make graphs with maps.

But I think you are asking why there aren't more tookits that have something like

Node {
    Iterator<Node> getAdjacentNodes();
}

Graph {
    void putNode(Node n);
    Iterator<Node> getAllNodes();
}

Look at Datomic. Also triple-stores use something like this. Similarly, cross-reference (xref) tables in relational databases can be used to create arbitrary graphs of relationships.

Having said that, I might restate your question as "Why don't more programming languages and tool-kits take a graph-based approach to problem decomposition?"

I think the answer is that efficient generic graph traversal is an NP-Hard problem lacking both a universal, efficient way to process graphs, and also a universal, efficient way to prove that a single solution is the most efficient one (e.g. the Travelling Salesman problem).

Graphs are absolutely essential for every CS student to learn because they deal with the most general relationships between data. Yet for that same reason, they should be your last choice for how to represent relationships. Most relationships are more specific. A car should only have one driver and one position on a given road at a time. There is generally no reason to join all cars with all possible drivers and all possible roads. Doing so would invite unnecessarily difficult problems. The more specific you can make the relationships between your objects or functions in your program design, the more you can optimize those relationships and the easier it is to understand and test them.

Finally, how do you traverse graphs? There is depth-first and breadth-first. Most programming languages make breadth-first traversal easy:

select * from passenger where car_id = 3;

myCar.getCurrentPassengers();

When you look at depth-first search, there is no one right way to do it. What's the best road from New York to San Francisco? It's solvable, but only by a fairly brute-force approach. If intersections are nodes and roads are edges, that doesn't help you solve this problem. Far better is it to know something about the roads - which ones are interstates, have traffic lights or stop-signs, are under construction, are prone to traffic, are closed, or not built yet? Your NYC to San Fran route search might start with just interstates, then examine the angles of intersection where you change roads, or the maximum distance from a straight (well, curved around the globe) line. These are footholds to finding an optimal solution efficiently. They hint at places to check for more local roads that work around a less-than-ideal interstate solution. The details of the problem are exactly what makes real-time, real-world route finding possible. Yet the details are precisely the aspect of graph traversal that a generic API would not capture.

There could be myNode.setWeight(double d) and myNode.getWeight() methods, but they still don't define an iteration order in a way that's any better than you just adding a weight entry to your hashMap or weight field on your object. Also, there might be different weights for bicycle routes or public transit, or whatever. Would you have a set or hash of weights for every node? I guess I don't see the use of a generic graph data structure because only the non-generic aspects of your data can make a graph traversable in anything close to polynomial time.

Related Topic