Why do HDFS clusters have only a single NameNode

I'm trying to understand better how Hadoop works, and I'm reading

The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy. Hadoop 0.21+ has a BackupNameNode that is part of a plan to have an HA name service, but it needs active contributions from the people who want it (i.e. you) to make it Highly Available.

from http://wiki.apache.org/hadoop/NameNode

So why is the NameNode a single point of failure? What is bad or difficult about having a complete duplicate of the NameNode running as well?

Best Answer

Why does the design of HDFS have a single name node? Simplicity. According to http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html#NameNode+and+DataNodes:

The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata.

You can have a secondary name node that can take over when the primary fails (see http://hadoop.apache.org/common/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html) and there are design proposals for distributed name nodes but, as far as I know, none are implemented reliably at this time.