MongoDB and ElasticSearch – How to Properly Index

Architecturebig dataelasticsearchjavamongodb

we are working on a JAVA EE project which handles huge amount of data, but has to provide full-text-search option (in hungarian language).
So we started to think about what kind of architecture could fulfill our requirements. My thoughts are the following:

Using ElasticSearch as a database is an antipattern so it must be used just for indexing and searching

MongoDB is fit for our expectations so it seems to be a good choice as database.

The problem is, how to index MongoDB data with ElasticSearch? I created a POC with 13 million documents. I iterated through the documents and in each iteration I saved them into MongoDB (it gave me an ID for each document) then I put the documents into ElasticSearch but stored only the Mongo ID. Document indexing was quite fast, average 4,8 ms per document.

When I search with Elastic, it gaves me back the matching document ID's and I can load the documents from Mongo with the $in operator. This also seemed quite fast.

All that means that it can be a good approach but is it really? I can't figure out when does this architecture slows down or what could be a bottleneck. Maybe syncronizing ElasticSearch with Mongo but it can be run on a distributed environment (Hadoop).

So my question: is there a better way to synchronize MongoDB with ElasticSearch?

Best Answer

I had the same request, and found these references that could help you.

Java + MongoDB + Elastic Search = River Plugin you can find at https://github.com/richardwilly98/elasticsearch-river-mongodb/wiki

And if you are really going to have a gorgeous amount of data to manage, so please read this interesting experience and the conclusion of the Quark'sLab : http://blog.quarkslab.com/mongodb-vs-elasticsearch-the-quest-of-the-holy-performances.html

Hope it helps.

Related Topic