Database – What’s the best way to modularize a User schema so it’s generic

databasedatabase-designmongoschema

I have a database design question. Basically I want to be able to create a schema for a User model, then use this User model in other models that extend User but I want to design it in such a way that it's generic enough to be used in every application.

For example a Profile or Account model might extend User, and in both cases they will be different based on the web application you are designing but the core credentials of User should never be different across any web application.

What fields do you think should be in the User model?

I think the bare minimum to successfully handle authentication would be:

email (the login unique identifier)
salt (obviously!)
password (duh)
lostToken (a hash to verify lost password functionality)
role (member, admin, editor, etc.. IMO this list of roles would differ between sites however it's too important to the User model not to have here?)

Now we get into other interesting fields that are still very very useful:

createdAt (when the account was created)
ipAddress (track the ip when the account was created)
refererUrl (which site it came from)
lastLoggedIn (the last time the user logged in)
isOnline (is the user currently online)

And even more fields that are still pretty useful:

username (might not be used on every site)
number of consecutive logins (similar to the stack network)

I think anything else like social data (likes, votes, profile views), badges/achievements, the last time they updated their profile/account/whatever, and other info like their name belong in the per-site Profile model.

What do you think?

Edit:
I fully understand that this question is partly subjective but I do think there's definitely room for discussion.

Best Answer

My first instinct would be to reuse an existing user authorization mechanism, rather than writing one. Then I would skip the rest until I had firm requirements. Premature generalization is just as bad as premature optimization.

A bit about relational databases

One of the most powerful features of relational databases is the ability to connect sets of data through common points. In order to do this efficiently, a database should follow the rules of normalization. To sum those rules up, a database should have:

No repeating elements or groups of elements
No partial dependencies on a concatenated key
No dependencies on non-key attributes

With these rules in mind, an example table for the comments could look like this:

----------------------------------------------------------------------    
| comments                                                           |
|--------------------------------------------------------------------|
| comment_id (key, auto-increment) |  comment  | post_id |  user_id  |
|----------------------------------|-----------|---------|-----------|
| 1                                | <text...> |    1    |   123452  |
----------------------------------------------------------------------

Here, the comment_id is the 'key,' or unique identifier, of the table. It's also been set to 'auto increment,' which means that it will automatically increase its value as 'records,' or rows, are added to the table. As shown, the table containing information related to comments only knows what it needs to know. How, then, do you relate user-specific information to a comment? This is where the 'relational' part of 'relational database' comes into play:

--------------------------------------------|
| users                                     |
|-------------------------------------------|--------------------------------|
| user_id (key, auto-increment) |   avatar  | additional fields not shown... |
|-------------------------------|-----------|--------------------------------|
| 123452                        | <img_url> |
---------------------------------------------

Note that the user_id column contains the same data for both the comments table and the users table. This way, you can 'join' data from the two tables. For example, to get all of the comments made by a user, you could run the MySQL query:

SELECT comment_id, comment, post_id FROM comments NATURAL JOIN users WHERE user_id=123452;

This method also answers the question

how should the user should keep track of all of their posts, comments, and favorites?

The user table should not keep track of such information in itself, but rather the respective tables should contain references to a globally unique user ID.

Almost there

So basically, you were on the right track. The only real change from the model you specified was to move the user's avatar information to the table concerning users, instead of in the table concerning comments.

As you seem to be leaning away from using a raw SQL database, you could consider the tables to be classes, and use the rules of normalization as a design guideline.

Finally, the Pointers vs. Arrays thing: both of those are very specific to the Parse backend (and admittedly, neither are explained very well). The best comparison I can come up with is that Pointers would be like Lists (as in Java or C#), and Arrays would be like, well, Arrays. The difference between the two is that Arrays can only store a predetermined amount of data, while Lists (or Pointers, in Parse's case) can store an unspecified amount. In theory, the amount would be infinite, but in practice the amount is determined by the amount of space available in the heap. For more information about the difference between Lists and Arrays, see this question. If you are planning to use Parse, I would recommend using Pointers in conjunction with Join Tables (which are essentially wrappers around the SQL method I described above), as those are the options offering the most flexibility.

Best Answer

Related Solutions

Database – Best way to model a singleton in a relational database

Database Design – Planning a Database Backend Correctly

A bit about relational databases

Almost there

Related Topic