Elasticsearch vs Cassandra vs Elasticsearch with Cassandra

cassandraelasticsearchlucene

I am learning NoSQL and looking at different options for one of my client's requirements. I have gone through various resources before putting up this question (a person with little knowledge in NoSQL)

  • I need to store data at faster rate and read data.
  • Fully fail-safe and easily scalable.
  • Able to search through data for Analytics.

I ended up with a short list of: Cassandra and Elasticsearch

What I do understand is Cassandra is a perfect NoSQL storage solution for me, as I can write data and read data using indexes. Where it fails or it could fail is on Analytics. In the future, if I want to get data from from_date to to_date, or more ways to get data for analytics, if I don't design the Data model properly or keeping long term sight, which might be quite hard in ever changing world.

While Elastic Search is best at indexing (backed by Lucene), and can search the data randomly by throwing some random text. But does it work the same for even if I want to retrieve data from_date to to_date (I expect it might be). But the real question is, is it a Search Engine, or perfect NoSQL data storage like Cassandra? If yes, why do we still need Cassandra?

If both of these are in different world, please explain that! How do we combine them to get a more effective solution?

Best Answer

One of our applications uses data that is stored into both Cassandra and ElasticSearch. We use Cassandra to access those records whenever we can, and have data duplicated into query tables designed to adhere to specific application-side requests. For a more liberal search than our query tables can allow, ElasticSearch performs that functionality nicely.

We have asked that same question (of ourselves)..."Why don't we just get everything from ElastsicSearch?"

The answer is that ElasticSearch was designed to be a search engine, and not a persistent data store. Sometimes ElasticSearch loses writes. Schema changes are difficult to do in ElasticSearch without blowing everything away and reloading. For that purpose, I have written jobs that are designed to keep ElasticSearch in-sync with our Cassandra cluster. There was also a fairly recent discussion on Quora about this topic, that yielded similar points.

That being said, ElasticSearch works great as a search engine. And Cassandra works great as a scalable, high-performance datastore. But querying data is different from searching for data. There are times that we need one or the other, and a combination of the two works well for our application. It may (or it may not) work well for yours.

As for analytics, I have had some success in using the Cassandra Spark connector, to serve more complex OLAP queries. Hope that helps.

Edit 20200421

I've written a newer answer to a similar question:

ElasticSearch vs. ElasticSearch+Cassandra

Related Topic