Database Design – Planning a Database Backend Correctly

databasedatabase-designMySQLrelational-database

I'm an iOS developer and to be honest I'm not really looking to become a database expert right at this moment, but I do need to know how to properly plan out what I believe to be a pretty cookie cutter social network style backend/database.

I'll keep this short and sweet:

Social network app will have users, posts, comments, and favorites.

Using comments as an example:

Each comment is an object that has been created in my backend for a class called "Comments" (Is this overkill? I don't see how else it could be achieved but I just can't believe how a server would handle so many individual comment objects if an app became popular, but maybe I'm just underestimating the power of a good backend?)

A comment object consists of a URL for the commenter's avatar, the commenter's user ID, a unique object ID, the post object's ID that the comment is for, and the comment text.

Posts and Favorites would also be their own classes/objects in the backend and would follow the same model as comments in the example above.

So my main question is, am I on track with the above? I want to make sure I'm structuring my backend properly.

My second question is, how should the user should keep track of all of their posts, comments, and favorites?

My initial idea was to just have array fields for the User class and store the ID's of the posts, comments, and favorites eg. User class has an array field called "Posts" with ID's of post objects.

The thing that confuses me is, is it overkill to keep track of these in the User class, when I'm storing the User's ID on the post, comment, and favorite classes already? And then I would just query those classes and filter for the user's ID, but then I think about it and if the app had a ton of usage, it might take longer to query all those objects and filter by the user's ID when the user could just keep track of their objects via fields on the User class.

Last but not least, last week I read the following in some Parse.com (popular Backend as a Service) docs:

"When you’re thinking about one-to-many relationships and whether to implement Pointers or Arrays, there are several factors to consider. First, how many objects are involved in this relationship? If the "many" side of the relationship could contain a very large number (greater than 100 or so) of objects, then you have to use Pointers."

I'm confused on what Pointers are in a backend?

I realize these are all pretty noob questions but I really want to make sure I setup my backend properly and make sure that it's structure makes sense.

Hopefully some of you guys will be willing to read this giant wall of text and help a brotha out.

Thanks for the help!

Best Answer

A bit about relational databases

One of the most powerful features of relational databases is the ability to connect sets of data through common points. In order to do this efficiently, a database should follow the rules of normalization. To sum those rules up, a database should have:

  1. No repeating elements or groups of elements
  2. No partial dependencies on a concatenated key
  3. No dependencies on non-key attributes

With these rules in mind, an example table for the comments could look like this:

----------------------------------------------------------------------    
| comments                                                           |
|--------------------------------------------------------------------|
| comment_id (key, auto-increment) |  comment  | post_id |  user_id  |
|----------------------------------|-----------|---------|-----------|
| 1                                | <text...> |    1    |   123452  |
----------------------------------------------------------------------   

Here, the comment_id is the 'key,' or unique identifier, of the table. It's also been set to 'auto increment,' which means that it will automatically increase its value as 'records,' or rows, are added to the table. As shown, the table containing information related to comments only knows what it needs to know. How, then, do you relate user-specific information to a comment? This is where the 'relational' part of 'relational database' comes into play:

--------------------------------------------|
| users                                     |
|-------------------------------------------|--------------------------------|
| user_id (key, auto-increment) |   avatar  | additional fields not shown... |
|-------------------------------|-----------|--------------------------------|
| 123452                        | <img_url> |
---------------------------------------------

Note that the user_id column contains the same data for both the comments table and the users table. This way, you can 'join' data from the two tables. For example, to get all of the comments made by a user, you could run the MySQL query:

SELECT comment_id, comment, post_id FROM comments NATURAL JOIN users WHERE user_id=123452;

This method also answers the question

how should the user should keep track of all of their posts, comments, and favorites?

The user table should not keep track of such information in itself, but rather the respective tables should contain references to a globally unique user ID.

Almost there

So basically, you were on the right track. The only real change from the model you specified was to move the user's avatar information to the table concerning users, instead of in the table concerning comments.

As you seem to be leaning away from using a raw SQL database, you could consider the tables to be classes, and use the rules of normalization as a design guideline.

Finally, the Pointers vs. Arrays thing: both of those are very specific to the Parse backend (and admittedly, neither are explained very well). The best comparison I can come up with is that Pointers would be like Lists (as in Java or C#), and Arrays would be like, well, Arrays. The difference between the two is that Arrays can only store a predetermined amount of data, while Lists (or Pointers, in Parse's case) can store an unspecified amount. In theory, the amount would be infinite, but in practice the amount is determined by the amount of space available in the heap. For more information about the difference between Lists and Arrays, see this question. If you are planning to use Parse, I would recommend using Pointers in conjunction with Join Tables (which are essentially wrappers around the SQL method I described above), as those are the options offering the most flexibility.

Related Topic