Database design for a social networking site

databasedatabase-designsocial-networking

What are the Tables that would be present in a social networking site (ex: Twitter).

I have a users table as of now. How to keep track of followers and people I do follow?

Should I maintain a separate table for followers and people I follow?
What are the columns which would be there in those tables?

Please don't think this as Subjective/Off topic. As I am a beginner, I thought experts can guide me to get a good DB design?

Best Answer

Try having a look at Database Answers in particular the data models. They have several different designs for various systems. This one is for a social networking site which may give you an idea of what's required.

You may want to search on SO for other social network database questions. I found this one that had a link to flickr showing a schema which appears to be from Facebook.

Your database design will be based around your system requirements. Without knowing exactly what you are trying to achieve, it is difficult to give you the best design.

Related Solutions

Php – Database Relationship or friend table design for a social networking site

Build the schema you need today, not the one you think you'll need 5 years from now.

Do you think facebook designed their schema to support 400 million users on day one? Of course not. Building for that kind of scale is complicated, expensive, and honestly, if you try it now, you'll probably get it wrong and have to redo it later anyway.

And let's be honest: you have a better chance of winning the lottery than hitting 400 million users any time soon. Even if you do, your project will have hundreds of engineers by then -- plenty of bandwidth for redesigning your schema.

Now's the time to build simple.

Edit to add some solid examples:

Youtube:

They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.

Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond to problems. It's true that nobody really knows what simplicity is, but if you aren't afraid to make changes then that's a good sign simplicity is happening.

Livejournal also grew from a single database on a single server to multiple sharded replicated databases

I'm sure you could find a dozen more examples on the highscalability blog

Sql – What are the options for storing hierarchical data in a relational database

My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.

The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.

I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.

Duration for     1,000 Nodes = 00:00:00:870 
Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
Duration for 1,000,000 Nodes = 'Didn't even try this'

And here's the duration for the new method (with the push stack method in parenthesis).

Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)

Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.

You can read about the new method and get a copy of the code at the following URL. http://www.sqlservercentral.com/articles/Hierarchy/94040/

I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article. http://www.sqlservercentral.com/articles/T-SQL/94570/

If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.

Best Answer

Related Solutions

Php – Database Relationship or friend table design for a social networking site

Sql – What are the options for storing hierarchical data in a relational database

Related Topic