Existing architecture: nodejs server with mongodb backend.
I have strings coming in describing images that can have #hashtags in them.
I wish to extract the hashtags from the strings, store the hashtags and associate the image with that hashtag.
So e.g. an image is uploaded with 'having fun at #bandcamp #nyc'
#bandcamp
and #nyc
are extracted.
-
If they don't exist as hashtags already, they're created and the image is associated with them both.
-
If they do exist, that's recognised and the image is associated with both.
So it will be possible to build a mongo find query that gets all images for a hashtag or multiple hashtags.
I'm new to nosql, I understand that in relational I'd have:
- table hashtags
- table images
- table imageshashtags
with a many to many relationship. An image can have many hash tags, and a hashtag can have many images.
What sort of approach is suitable with mongo?
From reading q&a like this: https://stackoverflow.com/questions/8455685/how-to-implement-post-tags-in-mongo
I see that I can implement a sub document in the image document with the tags. Is that efficient for searching and retrieving?
I could then use http://cookbook.mongodb.org/patterns/count_tags/ – map reduce?
So end up with:
images collection withwith tags subdocument
tags collection
- images document with tags subdocument with tags extracted and added to it when the image is created, and new tag added to the collection if it's not already present (i.e. tags must be unique)
also create the tag in the tags collection, and run map reduce.
Is that sound? Am I understanding things correctly and is my approach sensible?
Best Answer
Store hashtags in an array within a document.
That's the benefit of having documents: you can simply nest them. And, in this particular case, it's trivial:
(I added some "synonymous" tags, this can be done automatically by looking up some other document.)
This is the most natural solution for document-oriented database:
On the other hand, if you go with relational style, you'll be in a big trouble when you reinvent a SQL
JOIN
within your application code. This is one of the most common anti-patterns of using MongoDB (and such). Here's a very typical pseudocode:This is inefficient, not scalable, and you are simply reinventing a wheel. Using this, you'll probably end up with complexity of
O(N*M)
because of loops within your code. If you'd choose SQL with foreign keys instead, you'd have something likeO(N*log(M))
or evenO(N+M)
.There are no tables (relations) and foreign keys in MongoDB. Do not invent them, please. Use SQL instead, if you need. In fact, I highly suggest using SQL instead of MongoDB, unless your data really consists of documents.
Typical examples of documents are configurations, forms, and maybe user sessions. Those typically don't fit well into tables because of "random" structure.