Node.js Search Algorithm – Implementing Efficient Search in MongoDB

mongodbnode.jssearch

I would like to create a site where users can post articles with the following optional parts:

  • A title
  • Contents (text)
  • Categories
  • Keywords

Articles will be stored in mongodb and the site will be built in node.js. Users can search the site using a normal search text box.

I'm thinking about creating the following collections:

  • Users
  • Articles
  • Keywords

I will then create an entry for each keyword used in the Keywords collection with an array containing all the articles that use it. If a user conducts a search, the search is broken up into keywords and each keyword is looked up in the Keywords collection. Each article is then retrieved from the db and ranked based on relevance.

My questions are:

  1. Would it be efficient to use a Keywords collection like this, should I just use the Articles collection (Using full-text search or something) or should I structure it in some other way?
  2. How would I incorporate the ability to search the title, contents or categories for articles instead of just the keywords?
  3. Would it be better to use something like Apache Lucene than to build this functionality myself?

Best Answer

I would use an existing platform designed for search. You mentioned Lucene and there are others around based on the language you are using.

If you want to create a stand alone search server that is language agnostic look at SOLR. It is based on Lucene, so lots of support.

I personally like Sphinx, but it may not work in your situation, it all depends on the language you are using.

Related Topic