Database Design – Best Practices for Tagging System

database-designindexingtagging

I'm creating a system where items will have one or more tags.

Now there will be an item table that will store each item, and a tags table, that will store all tags used in the system.

My question is: What is the best design I can have that will allow me to link items to their respective tags, and record how many times an item received a particular tag from a user?

For example, I'm wondering if I should create a third table that would associate a particular item with a particular tag and how many times that tag was selected. The fields would be like this: item_id, tag_id, tag_hits – where tag_hits is the number of times the tag was used. This solutions would associate all items with their tags in a single table.

Or the other solution I'm thinking of is to create a table for each item and record each tag and the amount of the time the tag was used. So, for itemX I would create a table itemX and the fields would be: tag_id, tag_hits. For another itemY, I would create a table called itemY with the same fields. So for every item, there would be a table associating it with its tags. Now there could be hundreds of thousands of items in this system.

I suppose the developers of this website had to make similar consideration because each question is tagged. And you can browse questions by tag. My system will have similar functionality.

Can someone recommend a design solution, taking into consideration all that I have said, and performance consideration. Please indicate where you think indexing might help also. I use SQL databases in my development, but if you think a no-SQL solution would do it, please suggest a design along these lines. please be very specific.

Thanks,
Ron

Best Answer

Go with your first approach, it is pretty much the standard way of modeling this type of relationship. (updated to record user who added the tag)

Items
ItemID (PK)

ItemTags
ItemID (FK)
TagID (FK)

Tags
TagID (PK)

UserTags
ItemID (FK)
UserID (FK)

You could also do this by adding a UserID column to the ItemTags table and allowing duplicate rows for each tag, but the above approach his likely to be more performant.

The second approach seems like a bad idea since it wouldn't give you the flexibility to add more tags in the future and likely would make queries more difficult to write.

As for indexing: Index each of the foreign keys (the primary keys should automatically get indexed).

Use an RDBMS, a NO-SQL solution probably wouldn't give you any benefit for this type of data structure.

Whatever you do, avoid creating multi-value fields with some kind of comma or space delimited values. It is a performance killer and makes queries really hard to write.

Related Topic