Relational vs Graph Database for (initially) moderately-sized network

big datagraph-databasespostgres

We're developing an application whose data domain (or at least 90% of it) can be modeled effectively using a relational database. We've been using PostgreSQL since the beginning and have had no problems whatsoever. However, now the need arises to store relations (friendships) between users, much like Facebook or Snapchat, and we begin to wonder which of the following two paths is preferable:

  • Begin by storing friendships in a traditional relationship table in PostgreSQL and be done with it until scalability problems arise (namely the growth on the number of friendships and the infamous "friend of friend"-type queries).
  • Start upfront with a graph database (TitanDB + Cassandra) just to be ready for when the need to scale arises, but face a slower startup on development (which includes learning about TitanDB and Cassandra).

Our target is ~75M users. We don't really have an idea on what queries we will need to perform on this "graph"—for now, our only need is to store this information. Could PostgreSQL effectively scale to such numbers? Is it preferable to follow the graph approach upfront?

Best Answer

Your project's success is going to depend much more on the features you put in front of the users you manage to attract. For now, I would suggest that you prioritize that. After all, if you don't reach 75M users you won't have a scalability problem anyway, so the effort would be wasted.

To phrase this a different way, scalability issues follow from great levels of adoption. Your first problem is the adoption. Work on that first. If you don't work on things that will recruit users, your project will fail and the scalability issue will be moot.

Related Topic