Database Design – Architecture for Social Graph Data with Time Frame

database-designgraph-databasesgraph-traversal

I am adding some "social" type features to an existing application. There are a limited # of node & edge types. Overall the data itself is relatively small (50,000 – 70,000 for each type of node) there will be a number of edges (relationships) between them (almost all directional).

This, I know, is relatively easy to represent with an SDF store (such as BrightstarDB) or something like Microsoft's Trinity (or really many of the noSQL options).

The thing that, I think, makes this a unique use case is that each relationship will have a timeframe associated with it (start and end dates). Right now, I'm thinking of just storing this in a relational structure and dealing with the headaches of "traversing the graph", but I'm looking for suggestions on a better approach (both in terms of data structure and server):

Column
================
From_Node_ID  
Relationship
To_Node_ID
StartDate
EndDate

Any suggestions or thoughts are welcomed.

Best Answer

You should check out Neo4j-open source graph database. It is a mature and well supported project. It has a well written-updated documentation. It is java based but has several client libraries for ruby, jruby, php, python, c#... It works as embedded(a jar file) or as a server(you may connect and operate over HTTP with its REST based structure). And finally, it is disk based transactional.

In Neo4j, you could have properties assigned to nodes and also to relations. That means, in your case, you may connect two nodes with a relation that has properties called "StartDate" and "EndDate" or any other property you wish.

For example if you have users as nodes you may have a relationship called "interested_in" and you may connect UserA to UserB with the relation "interested_in" and assign StartDate as the date you created relation and later on you may assign an EndDate property as the date "interested_in" relation come to an end.

It may look like:

UserA -[interested]-> UserB
      StartDate:20121102
       EndDate:20121107

And users(I mean your nodes) could be connected to your existing database via giving "id" property to Neo4j nodes coming from your existing database. Or you may copy all or several properties(i.e name, surname, birth date, etc...) to the nodes in Neo4j, but this time you may need to synchronize your users in Neo4j and your database every time an update occurs in your data stores.

There are several example data models within the documentation of Neo4j

There is also InfiniteGraph which is similar to Neo4j and stated as "Distributed Graph Database", but I do not have any experience with it. For your case(50,000 - 70,000 for each type of node) Neo4j would be a perfect match with its support up to billions of nodes and relationships.

Related Topic