Architecture – Designing a social network with CQRS, graph databases and relational databases in mind

architectural-patternsArchitecturecqrsgraph-databasessocial-networks

I have done quite an amount of research on the topic so far, but i couldn't come up with a conclusion to make up my mind.

I am designing a social network and during my research i stumbled upon graph databases, i found neo4j pretty interesting for user relations and traversing through nodes. I also thought of using a relational database such as MS-SQL or MySQL to store entity data only and depending on neo4j for connections between entities. Of course this means more work in my application to store and pull data in and out of 2 different sources.

My first question : Is using this approach (graph + relational) a good approach for designing my social network keeping in mind that users on social networks don't have to in synch with real data by split second ? What are the positives and negatives of this approach ?

My Second question : I've been doing some reading on CQRS and as i understood it is mostly useful for collaborative environments, and environments where users see a lot of "stale" data. social networks has shared comments, events, etc .. and many users query or update the same data. Could CQRS be a helpful approach ? Would it give any performance/scalability benefits or non-useful complexity ? Is it fairly applicable with my possible choice of (graph + relational) databases approach mentioned in the question above ?

My purpose is to know if the approaches i have mentioned above seem good enough for the business context.

Best Answer

In my opinion, you are over-engineering the project. I think you do this because you believe you have to rely on cutting-edge techniques to handle business scale, but in many cases you will be better off relying on proven techniques and innovating only in a very focused scope.

A word of caution on graph databases: in my experience, they promise more than they can deliver. My experience is now some years ago, so I can't tell you if they have matured as products; but do you want to find out if something scales by using it as your main workhorse?

Let me remark there are some alternatives for such graph algorithms, some of them with provable scalability because they build on Hadoop's HDFS: see this SO thread or this Spark library.

On the topic of CQRS, it seems to deal with the kind of problems that large websites traditionally handle with a cache layer on top of their database replica sets. Write a wrapper around your queries that first looks into the cache layer, and if that misses its mark, then pulls the data from the database and also writes the result set into the cache. Here is a simple example in Python.

Moreover, splitting your queries into Commands and Queries on top of two database engines means that you have to decide, for each user request, whether it is graph-related or not, and whether it writes or just reads something; usually you will have a mix of all four possibilities. If you get your decisions right, you will get a faster, more responsive social network; but be aware that you will get the same performance boost by making the right decisions in almost any language and runtime. And even so, response times will almost certainly be dominated by network latency.

In your place, I would concentrate on one of both topics, and I would also concentrate much more on the question: what does this technique enable that is better than existing social networking sites?

Related Topic