Hadoop and HBase


hi I am new to hbase and hadoop. I couldn't find that why we are using hadoop with hbase. I know hadoop is a file system but I read that we can use hbase without hadoop so why are we using hadoop?

Best Answer

Hadoop is a platform that allows us to store and process large volumes of data across clusters of machines in a parallel manner..It is a batch processing system where we don't have to worry about the internals of data storage or processing.

It not only provides HDFS, the distributed file system for reliable data storage but also a processing framework, MapReduce, that allows processing of huge data sets across clusters of machines in a parallel manner.

One of the biggest advantage of Hadoop is that it provides data locality.By that I mean that moving data that is do huge is costly. So Hadoop moves computation to the data.Both Hdfs and MapReduce are highly optimized to work with really large data.

HDFS assures high availability and failover through data replication, so that if any one the machines in your cluster is down because of some catastrophe, your data is still safe and available.

On the other hand HBase is a NoSQL database.We can think of it as a distributed, scalable, big data store. It is used to overcome the pitfalls of Hdfs like "inability of random read and write".

Hbase is a suitable choice if we need random, realtime read/write access to our data.It was modeled after Google's "BigTable", while Hdfs was modeled after the GFS(Google file system).

It is not necessary to use Hbase on top Hdfs only.We can use Hbase with other persistent store like "S3" or "EBS". If you want to know about Hadoop and Hbase in deatil, you can visit the respective home pages -"hadoop.apache.org" and "hbase.apache.org".

You can also go through the following books if you want to learn in depth "Hadoop.The.Definitive.Guide" and "HBase.The.Definitive.Guide".

Related Topic